Search code examples
c#xmlpreorderpostorder

All elements order in XML preorder and postorder traversal in c#


I need a function to return preorder and postorder of all elements in C# which returns me a list of (XElement, Preorder , Postorder) for all Element. How can I do this?

for example, with this XML :

<?xml version='1.0' encoding='ISO-8859-1' ?>

<!--
  XML-Page generated by PENELOPE.
  Copyright (c) 1999 Araneus Group and University 'Roma Tre', Rome, ITALY
  All rights reserved.
-->
<SigmodRecord>
    <issue>
        <volume>11</volume>
        <number>1</number>
        <articles>
            <article>
                <title>Architecture of Future Data Base Systems</title>
                <authors>
                    <author position="00">Lawrence A. Rowe</author>
                    <author position="01">Michael Stonebraker</author>
                </authors>
                <initPage>30</initPage>
                <endPage>44</endPage>

            </article>
            <article>
                <title>Multisafe - A Data Security Architecture</title>
                <authors>
                    <author position="00">Robert P. Trueblood</author>
                    <author position="01">ii. Rex Hartson</author>
                </authors>
                <initPage>45</initPage>
                <endPage>63</endPage>
            </article>
        </articles>
    </issue>
    <issue>
        <volume>12</volume>
        <number>2</number>
        <articles>
            <article>
                <title>Comparison and Mapping of the Relational and CODASYL Data Models</title>
                <authors>
                    <author position="00">Gary H. Sockut</author>
                </authors>
                <initPage>55</initPage>
                <endPage>68</endPage>
            </article>
        </articles>
    </issue>
</SigmodRecord>

I need this answer :

Element = <SigmodRecord>,preorder = 1, postorder = 29
Element = <SigmodRecord/issue>,preorder = 2, postorder = 18
.
.
.

I have written this class but it works slow in large XML files because for each element it proceed all of nodes!

class CTraversal
{
    private int preorderID = 0;
    private int postorderID = 0;

    public int PreOrderTraversal(XElement RootNode, XElement CurrentNode)
    {
        if (CurrentNode == RootNode)
        {
            return ++preorderID;
        }
        if (RootNode.DescendantNodes().Count() > 1)
        {
            foreach (var child in RootNode.Elements())
            {
                if (PreOrderTraversal(child, CurrentNode) != -1) return preorderID;
            }
        }
        return -1;
    }

    public int PostOrderTraversal(XElement RootNode, XElement CurrentNode)
    {
        if (RootNode.DescendantNodes().Count() > 1)
        {
            foreach (var child in RootNode.Elements())
            {
                if (PostOrderTraversal(child, CurrentNode) != -1) return postorderID;
            }
        }
        if (CurrentNode == RootNode)
        {
            return ++postorderID;
        }
        ++postorderID;
        return -1;
    }
}

Solution

  • So handling a pre-order tree traversal is the easier of the two, so we'll start there.

    The method will accept a sequence of items, and a function that accepts an item and translates it into a sequence of items, this lets us get all children for any given node.

    Then in our method we'll have a stack of iterators, pushing the starting sequence's iterator into the stack to begin.

    From there we loop as long as there are any iterators to process. We get the current item to process, if it has children we yield its current item, push the iterator back onto the stack for later processing, fetch the sequence of children, push them onto the stack for processing.

    public static IEnumerable<T> PreOrder<T>(
        IEnumerable<T> source,
        Func<T, IEnumerable<T>> childSelector)
    {
        var stack = new Stack<IEnumerator<T>>();
        stack.Push(source.GetEnumerator());
        try
        {
            while (stack.Any())
            {
                var current = stack.Pop();
                if (current.MoveNext())
                {
                    stack.Push(current);
                    var children = childSelector(current.Current);
                    stack.Push(children.GetEnumerator());
                    yield return current.Current;
                }
                else
                {
                    current.Dispose();
                }
            }
        }
        finally
        {
            foreach (var iterator in stack)
                iterator.Dispose();
        }
    }
    

    This covers our pre-order algorithm of, for each item to process, yield it, then process all child items.

    Here's a test case to demonstrate it. It's an XML document in which each node has a value representing the order that it should be yielded from a pre order traversal:

    string preOrderExample =
    @"<node value='1'>
      <node value='2'>
        <node value='3'/>
        <node value='4'/>
        <node value='5'/>
      </node>
      <node value='6'>
        <node value='7'>
          <node value='8'/>
          <node value='9'>
            <node value='10'/>
          </node>
        </node>
        <node value='11'/>
      </node>
    </node>";
    
    var preOrderDoc = XDocument.Parse(preOrderExample);
    var query = PreOrder(new[] { preOrderDoc.Root }, node => node.Elements());
    foreach (var node in query)
        Console.WriteLine(node.Attribute("value").Value);
    

    This will print out 1, 2, 3, ..., 11, indicating they're being processed in the right order.

    Post order is very similar, but needs a bit of tweaking. Basically what we want to do is have exactly the same algorithm, but to yield each item after processing all of its children. The first thing to do is copy-paste the same algorithm and just remove the `yield. Now, where, in our algorithm, do we know where all of an item's children have been processed?

    Well, just after we pop an iterator off of the stack, the Current item of that iterator has just had all of its items processed. Well, that is, unless we haven't yet called MoveNext on that iterator at all, in which case there is no current item. So, if there is an item, we should yield it.

    Sadly, there's no way, given an IEnumerator, to know if it's had MoveNext called on it yet. We could create a new type that wraps an IEnumerator and keeps track of whether or not it's been started. This is probably actually a good idea, but since this is only used in this one method I cheated a bit and just used a Tuple<IEnumerator<T>, bool> to hold onto a sequence that may or may not have been started, replacing all usage of IEnumerator<T> in the method with that Tuple, and setting the boolean based on whether or not we know that we've asked that iterator for a value.

    public static IEnumerable<T> PostOrder<T>(
        IEnumerable<T> source,
        Func<T, IEnumerable<T>> childSelector)
    {
        var stack = new Stack<Tuple<IEnumerator<T>, bool>>();
        stack.Push(Tuple.Create(source.GetEnumerator(), false));
        try
        {
            while (stack.Any())
            {
                var current = stack.Pop();
                if (current.Item2)
                    yield return current.Item1.Current;
                if (current.Item1.MoveNext())
                {
                    stack.Push(Tuple.Create(current.Item1, true));
                    var children = childSelector(current.Item1.Current);
                    stack.Push(Tuple.Create(children.GetEnumerator(), false));
                }
                else
                {
                    current.Item1.Dispose();
                }
            }
        }
        finally
        {
            foreach (var iterator in stack)
                iterator.Item1.Dispose();
        }
    }
    

    Finally we have a test case that has the nodes' value ordered such that they'll print out 1, 2, 3, ... if they're written in post order:

    string postOrderExample =
    @"<node value='11'>
      <node value='4'>
        <node value='1'/>
        <node value='2'/>
        <node value='3'/>
      </node>
      <node value='10'>
        <node value='8'>
          <node value='5'/>
          <node value='7'>
            <node value='6'/>
          </node>
        </node>
        <node value='9'/>
      </node>
    </node>";
    
    var postOrderDoc = XDocument.Parse(postOrderExample);
    query = PostOrder(new[] { postOrderDoc.Root }, node => node.Elements());
    foreach (var node in query)
        Console.WriteLine(node.Attribute("value").Value);