Search code examples
htmlpowershellhtml-parsinghtml-agility-pack

Powershell 2.0 - Using HtmlAgilityPack to get children of FORM elements


Main problem stemmed from the fact that HtmlAgiltyPack won't get child nodes from a <form> element by default. See How to get all input elements in a form with HtmlAgilityPack without getting a null reference error for more information.

The problem is, that link shows how to fix the issue in C#, but I need to fix it in PowerShell. Any ideas?


I'll simplify my HTML

<form method="POST" action="post.aspx" id="form">
    <div>
        <input type="hidden" name="test1" id="test1" value="1" />
    </div>
    <input type="text" name="test2" id="test2" value="12345" />
</form>

Now I see that when I select the <form> element, I don't get any children back, hence why I couldn't select the <input> elements.

Add-Type -Path "C:\Program Files (x86)\HtmlAgilityPack\HtmlAgilityPack.dll"
$HTMLDocument = New-Object HtmlAgilityPack.HtmlDocument
$HTMLDocument.Load("C:\users\smithj\Desktop\test2.html")
$inputNodes=$HTMLDocument.DocumentNode.SelectNodes("//form")
$inputNodes

# Output shortened to show important bits ...
ChildNodes           : {}
HasChildNodes        : False

You can see that HasChildNodes is equal to false.

From the C# link I provided, I somehow need to run HtmlNode.ElementsFlags.Remove("form"); but I can't figure out what to type into PowerShell that would be equivalent.

Thanks again!


EDIT

Thanks to har07 for pointing me in the right direction. [HtmlAgilityPack.HtmlNode]::ElementsFlags.Remove("form") was what I needed to run.

Note that I need to run that command before I load in the HTML.

> Add-Type -Path ".\Net40\HtmlAgilityPack.dll"
> [HtmlAgilityPack.HtmlNode]::ElementsFlags.Remove("form")
True
>
> $HTMLDocument = New-Object HtmlAgilityPack.HtmlDocument
> $HTMLDocument.Load(".\file.html")
> $HTMLDocument.DocumentNode.SelectNodes("//form")

# Output shortened to show important bits ...
ChildNodes           : {#text, div, #text, input...}
HasChildNodes        : True
OuterHtml            : <form method="POST" action="post.aspx" id="form">
                           <div>
                               <input type="hidden" name="test1" id="test1" value="1">
                           </div>
                           <input type="text" name="test2" id="test2" value="12345">
                       </form>

Solution

  • Actually I'm not a user of PowerShell, but according to this blog post, you may want to try something like this :

    [HtmlAgilityPack.HtmlNode.ElementsFlags]::Remove("form")