q: xml parsing, need some non-trivial thing (c#)...

q: xml parsing, need some non-trivial thing (c#)...

Post by Yury Zenkevic » Sat, 21 Dec 2002 20:12:44



Hi All !

I end up with situation, when I need to parse some XML sequentially and do
not modify the content until some tag found. When this special tag is not
found yet, all the input should be gone to output w/out modifications. For
example:

<tag1>
    <tag2>
        <value>1</value>
        <value>2</value>
    </tag2>
    <special>
        <param>oops</param>
    </special>
    <tag3>...</tag3>
 </tag1>

I should parse this file and do not keep in memory, but rather output to
some stream until I find the "special" tag. Then, I have to use some
internal routine to proceed this tag and replace it content by some external
stuff, and proceed the rest of the input xml until new "special" found or
EOF.

Is it possible somehow in more "elegant" way ?

ps. thought of  XmlTextReader.GetRemainder(), but this only solve transition
AFTER some tag found, but I can't get "raw" xml data before some tag
found...

Thank you,

YZ.

 
 
 

q: xml parsing, need some non-trivial thing (c#)...

Post by Tobin Harri » Sat, 21 Dec 2002 21:29:27


We recently did something similar for a content management system where
users could embed xml instruction tags into the html, and then the .NET dll
would interpret these instructions into output, but leave existing tags in
place.

The basic approach was this.

- recursively loop through all XmlNodes of the XmlDocument
- for each node, check it's 'name', and see if it is a special one.
- if it is, pass the node into the appropraite Handle function, which
examines the node, does some processing, and returns a string of the output.
- if it isn't a special node, output it's 'outer xml' which will just dump
it back into the stream.

To be honest, our application required a little more than this. We had
Handler objects for each node type, which just had  a 'Handle' method. We
also passed contextual information down the recursive call, so that the
Handlers were aware of any information they needed for processing. I dont'
know if this is what you need, but may help

Tobin


Quote:> Hi All !

> I end up with situation, when I need to parse some XML sequentially and do
> not modify the content until some tag found. When this special tag is not
> found yet, all the input should be gone to output w/out modifications. For
> example:

> <tag1>
>     <tag2>
>         <value>1</value>
>         <value>2</value>
>     </tag2>
>     <special>
>         <param>oops</param>
>     </special>
>     <tag3>...</tag3>
>  </tag1>

> I should parse this file and do not keep in memory, but rather output to
> some stream until I find the "special" tag. Then, I have to use some
> internal routine to proceed this tag and replace it content by some
external
> stuff, and proceed the rest of the input xml until new "special" found or
> EOF.

> Is it possible somehow in more "elegant" way ?

> ps. thought of  XmlTextReader.GetRemainder(), but this only solve
transition
> AFTER some tag found, but I can't get "raw" xml data before some tag
> found...

> Thank you,

> YZ.


 
 
 

q: xml parsing, need some non-trivial thing (c#)...

Post by Yury Zenkevic » Sat, 21 Dec 2002 21:39:48


Hi Tobin,

This is exactly the situation Im in - html code with some special tags in it
:)
But in your solution you are keeping all the data in memory while rendering,
which is not the thing I can do - my data should be proceed in a "stream"
way, because it can take alot of time & memory space...

So far, I could not find anything else but seek these tags "manually", but
this approach is very limited in terms of special tag hierarchy (recursive
tags etc.) And I dont want to make whole new xml reader just for that... :)

Thank you anyway,

YZ.


> We recently did something similar for a content management system where
> users could embed xml instruction tags into the html, and then the .NET
dll
> would interpret these instructions into output, but leave existing tags in
> place.

> The basic approach was this.

> - recursively loop through all XmlNodes of the XmlDocument
> - for each node, check it's 'name', and see if it is a special one.
> - if it is, pass the node into the appropraite Handle function, which
> examines the node, does some processing, and returns a string of the
output.
> - if it isn't a special node, output it's 'outer xml' which will just dump
> it back into the stream.

> To be honest, our application required a little more than this. We had
> Handler objects for each node type, which just had  a 'Handle' method. We
> also passed contextual information down the recursive call, so that the
> Handlers were aware of any information they needed for processing. I dont'
> know if this is what you need, but may help

> Tobin



> > Hi All !

> > I end up with situation, when I need to parse some XML sequentially and
do
> > not modify the content until some tag found. When this special tag is
not
> > found yet, all the input should be gone to output w/out modifications.
For
> > example:

> > <tag1>
> >     <tag2>
> >         <value>1</value>
> >         <value>2</value>
> >     </tag2>
> >     <special>
> >         <param>oops</param>
> >     </special>
> >     <tag3>...</tag3>
> >  </tag1>

> > I should parse this file and do not keep in memory, but rather output to
> > some stream until I find the "special" tag. Then, I have to use some
> > internal routine to proceed this tag and replace it content by some
> external
> > stuff, and proceed the rest of the input xml until new "special" found
or
> > EOF.

> > Is it possible somehow in more "elegant" way ?

> > ps. thought of  XmlTextReader.GetRemainder(), but this only solve
> transition
> > AFTER some tag found, but I can't get "raw" xml data before some tag
> > found...

> > Thank you,

> > YZ.

 
 
 

q: xml parsing, need some non-trivial thing (c#)...

Post by Nicholas Paldino [.NET/C# MVP » Sat, 21 Dec 2002 22:27:53


Yury,

    If you need to process this using a stream then you will not be able to
do it with the framework classes.

    Perhaps you can cycle through your document using a stream-based
approach, gathering the data you need for each replace.  Once you do that,
perhaps you can create an XSLT stylesheet to do the transformation for you.

    Hope this helps.

--
               - Nicholas Paldino [.NET/C# MVP]


> Hi Tobin,

> This is exactly the situation Im in - html code with some special tags in
it
> :)
> But in your solution you are keeping all the data in memory while
rendering,
> which is not the thing I can do - my data should be proceed in a "stream"
> way, because it can take alot of time & memory space...

> So far, I could not find anything else but seek these tags "manually", but
> this approach is very limited in terms of special tag hierarchy (recursive
> tags etc.) And I dont want to make whole new xml reader just for that...
:)

> Thank you anyway,

> YZ.



> > We recently did something similar for a content management system where
> > users could embed xml instruction tags into the html, and then the .NET
> dll
> > would interpret these instructions into output, but leave existing tags
in
> > place.

> > The basic approach was this.

> > - recursively loop through all XmlNodes of the XmlDocument
> > - for each node, check it's 'name', and see if it is a special one.
> > - if it is, pass the node into the appropraite Handle function, which
> > examines the node, does some processing, and returns a string of the
> output.
> > - if it isn't a special node, output it's 'outer xml' which will just
dump
> > it back into the stream.

> > To be honest, our application required a little more than this. We had
> > Handler objects for each node type, which just had  a 'Handle' method.
We
> > also passed contextual information down the recursive call, so that the
> > Handlers were aware of any information they needed for processing. I
dont'
> > know if this is what you need, but may help

> > Tobin



> > > Hi All !

> > > I end up with situation, when I need to parse some XML sequentially
and
> do
> > > not modify the content until some tag found. When this special tag is
> not
> > > found yet, all the input should be gone to output w/out modifications.
> For
> > > example:

> > > <tag1>
> > >     <tag2>
> > >         <value>1</value>
> > >         <value>2</value>
> > >     </tag2>
> > >     <special>
> > >         <param>oops</param>
> > >     </special>
> > >     <tag3>...</tag3>
> > >  </tag1>

> > > I should parse this file and do not keep in memory, but rather output
to
> > > some stream until I find the "special" tag. Then, I have to use some
> > > internal routine to proceed this tag and replace it content by some
> > external
> > > stuff, and proceed the rest of the input xml until new "special" found
> or
> > > EOF.

> > > Is it possible somehow in more "elegant" way ?

> > > ps. thought of  XmlTextReader.GetRemainder(), but this only solve
> > transition
> > > AFTER some tag found, but I can't get "raw" xml data before some tag
> > > found...

> > > Thank you,

> > > YZ.

 
 
 

q: xml parsing, need some non-trivial thing (c#)...

Post by Yury Zenkevic » Sat, 21 Dec 2002 22:46:50


Can you please add some more information to "cycle .... using stream-based
approach" ?
And - is it possible in XSLT to out some non-processed tags "as is", i.e.
not modifying their content, but just out the raw data ? I dont think so...
Hmm, I mean, when you proceed with XSLT, you cycle through the xml tags, but
you do not have control how the other tags being render, right ?

Take the example I've sent in the start - can you create xslt stylesheet to
do the things I've been talking about ? i.e. leaving all the document the
way it is, but replace some tags with another content ?

Thanx !



> Yury,

>     If you need to process this using a stream then you will not be able
to
> do it with the framework classes.

>     Perhaps you can cycle through your document using a stream-based
> approach, gathering the data you need for each replace.  Once you do that,
> perhaps you can create an XSLT stylesheet to do the transformation for
you.

>     Hope this helps.

> --
>                - Nicholas Paldino [.NET/C# MVP]



> > Hi Tobin,

> > This is exactly the situation Im in - html code with some special tags
in
> it
> > :)
> > But in your solution you are keeping all the data in memory while
> rendering,
> > which is not the thing I can do - my data should be proceed in a
"stream"
> > way, because it can take alot of time & memory space...

> > So far, I could not find anything else but seek these tags "manually",
but
> > this approach is very limited in terms of special tag hierarchy
(recursive
> > tags etc.) And I dont want to make whole new xml reader just for that...
> :)

> > Thank you anyway,

> > YZ.



> > > We recently did something similar for a content management system
where
> > > users could embed xml instruction tags into the html, and then the
.NET
> > dll
> > > would interpret these instructions into output, but leave existing
tags
> in
> > > place.

> > > The basic approach was this.

> > > - recursively loop through all XmlNodes of the XmlDocument
> > > - for each node, check it's 'name', and see if it is a special one.
> > > - if it is, pass the node into the appropraite Handle function, which
> > > examines the node, does some processing, and returns a string of the
> > output.
> > > - if it isn't a special node, output it's 'outer xml' which will just
> dump
> > > it back into the stream.

> > > To be honest, our application required a little more than this. We had
> > > Handler objects for each node type, which just had  a 'Handle' method.
> We
> > > also passed contextual information down the recursive call, so that
the
> > > Handlers were aware of any information they needed for processing. I
> dont'
> > > know if this is what you need, but may help

> > > Tobin



> > > > Hi All !

> > > > I end up with situation, when I need to parse some XML sequentially
> and
> > do
> > > > not modify the content until some tag found. When this special tag
is
> > not
> > > > found yet, all the input should be gone to output w/out
modifications.
> > For
> > > > example:

> > > > <tag1>
> > > >     <tag2>
> > > >         <value>1</value>
> > > >         <value>2</value>
> > > >     </tag2>
> > > >     <special>
> > > >         <param>oops</param>
> > > >     </special>
> > > >     <tag3>...</tag3>
> > > >  </tag1>

> > > > I should parse this file and do not keep in memory, but rather
output
> to
> > > > some stream until I find the "special" tag. Then, I have to use some
> > > > internal routine to proceed this tag and replace it content by some
> > > external
> > > > stuff, and proceed the rest of the input xml until new "special"
found
> > or
> > > > EOF.

> > > > Is it possible somehow in more "elegant" way ?

> > > > ps. thought of  XmlTextReader.GetRemainder(), but this only solve
> > > transition
> > > > AFTER some tag found, but I can't get "raw" xml data before some tag
> > > > found...

> > > > Thank you,

> > > > YZ.

 
 
 

q: xml parsing, need some non-trivial thing (c#)...

Post by Martin Honne » Sun, 22 Dec 2002 00:35:05



> Can you please add some more information to "cycle .... using stream-based
> approach" ?
> And - is it possible in XSLT to out some non-processed tags "as is", i.e.
> not modifying their content, but just out the raw data ? I dont think so...
> Hmm, I mean, when you proceed with XSLT, you cycle through the xml tags, but
> you do not have control how the other tags being render, right ?

> Take the example I've sent in the start - can you create xslt stylesheet to
> do the things I've been talking about ? i.e. leaving all the document the
> way it is, but replace some tags with another content ?

That is easy, consider

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:template match="special">
   <newSpecial>

   </newSpecial>
</xsl:template>


   <xsl:copy>

   </xsl:copy>
</xsl:template>

</xsl:stylesheet>

an input of

<?xml version="1.0" encoding="UTF-8"?>
<tag1>
     <tag2>
         <value>1</value>
         <value>2</value>
     </tag2>
     <special>
         <param>oops</param>
     </special>
     <tag3>...</tag3>
</tag1>

is transformed to

<?xml version="1.0" encoding="UTF-8"?>

<tag1>
     <tag2>
         <value>1</value>
         <value>2</value>
     </tag2>
     <newSpecial>
         <param>oops</param>
     </newSpecial>
     <tag3>...</tag3>
</tag1>

All nodes are copied, besides <special> elements, they are replaced by
<newSpecial> elements (while the content of <special> is copied over)
--

        Martin Honnen
        http://JavaScript.FAQTs.com/

 
 
 

q: xml parsing, need some non-trivial thing (c#)...

Post by Yury Zenkevic » Sun, 22 Dec 2002 01:22:00


Wow ! That was easy :) The only thing that still keeps me from total shame
is that Im using xsl for only a month... Hmm... That was obvious :)

THANK YOU !

:)



> > Can you please add some more information to "cycle .... using
stream-based
> > approach" ?
> > And - is it possible in XSLT to out some non-processed tags "as is",
i.e.
> > not modifying their content, but just out the raw data ? I dont think
so...
> > Hmm, I mean, when you proceed with XSLT, you cycle through the xml tags,
but
> > you do not have control how the other tags being render, right ?

> > Take the example I've sent in the start - can you create xslt stylesheet
to
> > do the things I've been talking about ? i.e. leaving all the document
the
> > way it is, but replace some tags with another content ?

> That is easy, consider

> <?xml version="1.0" encoding="UTF-8"?>
> <xsl:stylesheet version="1.0"
> xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

> <xsl:template match="special">
>    <newSpecial>

>    </newSpecial>
> </xsl:template>


>    <xsl:copy>

>    </xsl:copy>
> </xsl:template>

> </xsl:stylesheet>

> an input of

> <?xml version="1.0" encoding="UTF-8"?>
> <tag1>
>      <tag2>
>          <value>1</value>
>          <value>2</value>
>      </tag2>
>      <special>
>          <param>oops</param>
>      </special>
>      <tag3>...</tag3>
> </tag1>

> is transformed to

> <?xml version="1.0" encoding="UTF-8"?>

> <tag1>
>      <tag2>
>          <value>1</value>
>          <value>2</value>
>      </tag2>
>      <newSpecial>
>          <param>oops</param>
>      </newSpecial>
>      <tag3>...</tag3>
> </tag1>

> All nodes are copied, besides <special> elements, they are replaced by
> <newSpecial> elements (while the content of <special> is copied over)
> --

> Martin Honnen
> http://JavaScript.FAQTs.com/