Please educate me regarding to XML encoding and XML parser

Please educate me regarding to XML encoding and XML parser

Post by wenma » Fri, 11 Apr 2003 00:41:41



Hi,
I have some questions regarding to XML encoding scheme and parsing XML
file:
1. in XML books, five special chars(entities) have been mentioned,
<, >, &, ' and "
I am dealting with XML file that may contain those five special chars,
are they encoded if "encoding="UTF-8" is specified? e.g., "<" to
"<"? I manually created a xml file on UNIX that the content
contained some of those special chars and caused Xerces's C++ SAX
parser to throw exeception.
2. How does Xerces's parser decode those encoded special chars back to
normal chars after calling document handler characters() function?
3. How can I "config" Xerces SAX parser's characters() function to
dynamically retrieve any size of the content? what is max size that
characters() can retrieve?
Thanks.
 
 
 

Please educate me regarding to XML encoding and XML parser

Post by Martin Honne » Fri, 11 Apr 2003 18:01:43



> Hi,
> I have some questions regarding to XML encoding scheme and parsing XML
> file:
> 1. in XML books, five special chars(entities) have been mentioned,
> <, >, &, ' and "
> I am dealting with XML file that may contain those five special chars,
> are they encoded if "encoding="UTF-8" is specified? e.g., "<" to
> "&lt;"? I manually created a xml file on UNIX that the content
> contained some of those special chars and caused Xerces's C++ SAX
> parser to throw exeception.

There are rules that an XML document has to obey.
The characters < and & can only be used for markup but not as character
data. Those characters need to encoded by entity references when you
need to have them in character data.
Thus you need
    <root>&lt;</root>
or
    <root>&amp;</root>

--

        Martin Honnen
        http://javascript.faqts.com/