XML encoding question

XML encoding question

Post by joe lipso » Thu, 23 May 2002 14:09:53



I've got some XML that contains non ASCII chars in CDATA sections, both IE6
and the XML library I'm using (perl XML::Simple) spew when they get to the
non ASCII chars.

The default encoding is UTF-8 right? how do I know if a char is UTF-8 or an
invalid binary char?

one of the characters it is failing on is a degree (as in degrees Celsius)
symbol, will I have to create an entity reference for this or use the
Unicode   format .

thanks

Joe

 
 
 

XML encoding question

Post by Philippe Poular » Thu, 23 May 2002 18:05:27



> I've got some XML that contains non ASCII chars in CDATA sections, both IE6
> and the XML library I'm using (perl XML::Simple) spew when they get to the
> non ASCII chars.

> The default encoding is UTF-8 right? how do I know if a char is UTF-8 or an

                          RIGHT
Quote:> invalid binary char?

SEE TABLE BELOW
THE PARSER MUST TELL IT FOR YOU (YOU DON'T HAVE TO CARE ABOUT THAT)
Quote:

> one of the characters it is failing on is a degree (as in degrees Celsius)
> symbol, will I have to create an entity reference for this or use the
> Unicode   format .

USING AN ENTITY THAT REFERS TO UNICODE CHAR OR THAT CHAR DIRECTLY IS THE
SAME

Quote:

> thanks

> Joe

Hi Joe,

  stands for non-breaking space
° stands for degree char

Unicode address                    UTF-8 sequence
from 0000 0000 to 0000 007F        0xxxxxxx
from 0000 0080 to 0000 07FF        110xxxxx 10xxxxxx
from 0000 0800 to 0000 FFFF        1110xxxx 10xxxxxx 10xxxxxx
from 0001 0000 to 001F FFFF        11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

Notice that an UTF-8 encoding document can't contains FE and FF bytes

see RFC2279 for more

I think you should specify which encoding you use in your document, even
if it is UTF-8

--
Cordialement,

           ///
          (. .)
 -----ooO--(_)--Ooo-----
|   Philippe Poulard    |
 -----------------------

 
 
 

1. To encode, or not to encode - That is the question.

Can someone please explain once and for all how encoding works with XSL
transformations. When does the ASP Code page change the output, when does the
encoding attribute in the XML doc change it, and how does the XSL determine the
encoding to use.

When I use transformNode with response.write I get different results to using
transformNodeToObject (IStream:Response). In fact using the latter, netscape seems
to display all sorts of garbage even with all of the above set to standard ASCII.
Try it for you self!

Help would be very much appreciated - this is has been going on too long now!

Regards.

2. flush

3. Urgent question: XML parser and encodings (f.e. Japanese)

4. *** VIENNA 2.0 on ITALIAN MIDI

5. XML encoding type for RTF data (ANSI encoding)

6. SCSI cable Needed for 40Meg HD

7. XML -"Switch from current encoding to specified encoding not supported."...????using hebrew

8. Epson EPL-N1600

9. How can I get the ouput encoding from the xml encoding

10. UTF-16 encoding questions

11. UTF-8 encoding question

12. UTF Encoding Question

13. Encoding question