Search code examples
c#enumsenumerablebridge.net

Unflattening a list to sub-objects


edit: I realized it looks like I come for answers right away. I have tried to do it by myself but I came to believe there is a mechanic i do not fully understand. I just can't wrap my head around this problem !

edit2: I misused words ! With "parent" and "children" I do not use the "DOM" meaning ! Here is the HTML used for my current dev

<body>
    <h1>FIRST LEVEL TITLE</h1>
    <h4>test</h4>
    <h2>SECOND LEVEL TITLE</h2>
    <h3>THIRD LEVEL TITLE</h3>
    <h4>test</h4>
    <h3>THIRD LEVEL TITLE</h3>
    <h3>THIRD LEVEL TITLE</h3>
    <h2>SECOND LEVEL TITLE</h2>
    <h3>THIRD LEVEL TITLE</h3>
    <h3>THIRD LEVEL TITLE</h3>
    <h4>test</h4>
    <h4>test</h4>
</body>

The "parent" and "child" hierarchy I want to create is purely virtual (only exists inside my library, not in the HTML) ! My title tags are not nested


Using Bridge.NET and Bridge.JQuery I have successfully retrieved the complete list of title tags from the HTML (h1, h2 and such) and stored them into a flat list. Now I am trying to give this list a hierarchy so every element in my list have a property "Children" containing all the elements that are directly below them, these sub elements containing other sub elements ...

An element is a direct children if there is no intermediate-level element between it and its expected parent.

in the list H2, H3, H4,

H4, is a child of H3, which is a child of H2. But in the list H2, H4, H3,

H3 AND H4 are children of H2

example:

H1 - FIRST LEVEL TITLE
H4 - test
H2 - SECOND LEVEL TITLE
H3 - THIRD LEVEL TITLE
H4 - test
H3 - THIRD LEVEL TITLE
H3 - THIRD LEVEL TITLE
H2 - SECOND LEVEL TITLE
H3 - THIRD LEVEL TITLE
H3 - THIRD LEVEL TITLE
H4 - test
H4 - test

becomes

H1 - FIRST LEVEL TITLE
--->H4 - test
--->H2 - SECOND LEVEL TITLE
--->--->H3 - THIRD LEVEL TITLE
--->--->H4 - test
--->--->--->H3 - THIRD LEVEL TITLE
--->--->--->H3 - THIRD LEVEL TITLE
--->H2 - SECOND LEVEL TITLE
--->--->H3 - THIRD LEVEL TITLE
--->--->H3 - THIRD LEVEL TITLE
--->--->H4 - test
--->--->H4 - test

each of the titles in the flat list are defined as follow (any modification can be made to help achieve what I want)

internal class Title
{
    public TitleLevel Level { get; set; }
    public string Value { get; set; }
    public IEnumerable<Title> Children { get; set; }  

    /* removed irrelevant code */
}

internal enum TitleLevel
{
    H6,
    H5,
    H4,
    H3,
    H2,
    H1,
    DummyValue // exists for technical reasons, I may ask for advice on that later
}

Solution

  • So here is my solution to what I wanted (un-flatten a list)

    The code really isn't pretty, but it works.

    It is called like this

    var hierarchyList = GetTitleHierarchy(flatList);
    

    And here is the definition

    /// <summary>
    /// Turns a flat list of titles into a list where each element contains a list of it children, themselves containing a list of childre, ...
    /// </summary>
    /// <param name="titles">A flat list of titles</param>
    /// <returns>A "cascading" list of titles</returns>
    internal List<Title> GetTitleHierarchy(List<Title> titles)
    {
        return PutInParent(titles);
    }
    
    private List<Title> PutInParent(List<Title> titles)
    {
        var output = new List<Title>();
    
        for (int i = 0; i < titles.Count; i++)
        {
            // Copy because if passed by reference we'll get loop-referencing
            var title = titles.Get(i).Copy();
    
            var childrenCount = CountChildren(titles, i);
            if (childrenCount > 0)
            {
                var subItems = titles.GetRange(i + 1, childrenCount);
                title.Children = PutInParent(subItems);
            }
    
            output.Add(title);
            i += childrenCount;
        }
    
        return output;
    }
    
    /// <summary>
    /// Returns the number of titles after the current index that should be children of the current element
    /// </summary>
    /// <param name="titles">the flat list of elements</param>
    /// <param name="startIndex">the current position in the list</param>
    /// <returns></returns>
    internal int CountChildren(List<Title> titles, int startIndex)
    {
        var clidrenCount = 0;
    
        foreach (var title in titles.Skip(startIndex + 1))
        {
            if (title.IsLevelLowerThan(titles.Get(startIndex).Level))
                break;
    
            clidrenCount++;
        }
    
        return clidrenCount;
    }
    
    internal class Title
    {
        public TitleLevel Level { get; set; }
        public string Value { get; set; }
        public IEnumerable<Title> Children { get; set; }
    
        #region Overrides of Object
    
        public override string ToString()
        {
            return $"{Level} - {Value}";
        }
    
        #endregion
    
        public Title Copy()
        {
            return new Title
            {
                Level = Level,
                Value = Value,
                Children = Children.Select(c => c.Copy())
            };
        }
    
        public bool IsLevelLowerThan(TitleLevel targetLevel)
        {
            return (int) Level <= (int) targetLevel;
        }
    }
    
    internal enum TitleLevel
    {
        H1 = 1,
        H2 = 2,
        H3 = 3,
        H4 = 4,
        H5 = 5,
        H6 = 6,
        DummyValue = 0
    }