Search code examples
c#.netxpathhtml-agility-pack

HtmlAgilityPack getting id of parrent node


Given the snippet of html and code bellow if you know part of the src e.g. 'FileName' how do you get the post ID of the parent div this could be higher up the dom tree and there could be 0, 1 or many src's with the same 'FileName'

I'm after "postId_19701770"

I've attempted to follow this page and this page I get Error CS1061 'HtmlNodeCollection' does not contain a definition for 'ParentNode'

namespace GetParent
{
    class Program
    {
        static void Main(string[] args)
        {
            var html =
@"<body>
<div id='postId_19701770' class='b-post'>
            <h1>This is <b>bold</b> heading</h1>
            <p>This is <u>underlined</u> paragraph <div src='example.com/FileName_720p.mp4' </div></p>
</div>
        </body>";

            var htmlDoc = new HtmlDocument();
            htmlDoc.LoadHtml(html);
            string keyword = "FileName";
            var node = htmlDoc.DocumentNode.SelectNodes("//*[text()[contains(., '" + keyword + "')]]");

            var parentNode = node.ParentNode;

            Console.WriteLine(parentNode.Name);

            Console.ReadLine();
        }
    }
}

Solution

  • Reason your code is not working is because you are looking up a ParentNode of a collection of nodes. You need to select a single node and then look up its parent.

    You can search all the nodes (collection) by src as well that contains the data you are looking for. Once you have the collection, you can search each of those nodes to see which one you need or select the First() one from that collection to get its Parent.

    var html =
    @"<body>
    <div id='postId_19701770' class='b-post'>
    <h1>This is <b>bold</b> heading</h1>
    <p>This is <u>underlined</u> paragraph <div src='example.com/FileName_720p.mp4' </div></p>
    </div>
    </body>";
    
    var htmlDoc = new HtmlDocument();
    htmlDoc.LoadHtml(html);
    string keyword = "FileName";
    var node = htmlDoc.DocumentNode.SelectNodes("//*[contains(@src, '" + keyword + "')]");
    
    var parent = node.First().ParentNode; //node is a collection so get the first node for ex.
    Console.WriteLine(parent.GetAttributeValue("id", string.Empty));
    
    // Prints
    postId_19701770
    

    Instead of looking up "all" nodes, you can search specifically for 1 node via SelectSingleNode method

    var singleNode = htmlDoc.DocumentNode.SelectSingleNode(@"//*[contains(@src, '" + keyword + "')]");
    Console.WriteLine(singleNode.ParentNode.GetAttributeValue("id", string.Empty));
    
    // prints 
    postId_19701770