Search code examples
c#xpathhtml-agility-packhtmldoc

How to access current node and descendants from HTML document with agility package?


I loaded the HTML into a HTMLdocument. Now I want to access/select each dt with every dd which belongs to the dt and store it in an array for later usage. I have already tried the XPath syntax as mentioned in http://www.w3schools.com/xsl/xpath_axes.asp, but it didn' work at all. I just received a NullReferenceException. But what do I do wrong?

Please keep in mind that sometimes there are 2 or more **dd** elements for one **dt**. I want to add every **dd** element to the corresponding **dt**.

Many thanks in advance.

<dl>
  <dt id="one">one</dt>
  <dd>some text</dd>
  <dt id="two">two</dt>
  <dd>some text</dd>
  <dt id="three">three</dt>
  <dd>some text</dd>
  <dd>some text</dd>
  <dt id="four">four</dt>
  <dd>some text</dd>
  and so on...
</dl>

Solution

  • There is no direct link between the dt and the dd elements, that's why I personally didn't find a way to provide you a solution using XPath. XSLT might be an option, however, I haven't found a quick and easy way using XSLT either. Since you use C# I made a quick prototype-function of how this could look in C#:

    public static void Main(string[] args)
            {            
                Dictionary<string, List<string>> dt = new Dictionary<string, List<string>>();        
    
                using(XmlReader reader = XmlReader.Create(@"data.xml")){
                    bool incomingDd = false;
                    while(reader.Read()){
                        switch(reader.NodeType){
                            case XmlNodeType.Element:                            
                                if(String.Equals(reader.Name, "dt", StringComparison.OrdinalIgnoreCase)){
                                    dt.Add(reader.GetAttribute("id"), new List<string>());
                                }
                                else if(String.Equals(reader.Name, "dd", StringComparison.OrdinalIgnoreCase)){
                                    incomingDd = true;                                
                                }
                                break;
    
                            case XmlNodeType.Text:                                
                                if(incomingDd && !String.IsNullOrEmpty(reader.Value)){                                
                                    dt.Values.ElementAt(dt.Count -1).Add(reader.Value);
                                    incomingDd = false;
                                }
                                break;
                        }
                    }
                }
    
                foreach(var item in dt){
                    Console.WriteLine($"{item.Key} {item.Value.Count()}:");
                    foreach(var dd in item.Value){
                        System.Console.WriteLine($"\t{dd}");
                    }
                }
            }
    

    This might not be the prettiest code to fit your needs, but this should give you an idea of how to solve your problem.