Main problem stemmed from the fact that HtmlAgiltyPack won't get child nodes from a <form>
element by default. See How to get all input elements in a form with HtmlAgilityPack without getting a null reference error for more information.
The problem is, that link shows how to fix the issue in C#, but I need to fix it in PowerShell. Any ideas?
I'll simplify my HTML
<form method="POST" action="post.aspx" id="form">
<div>
<input type="hidden" name="test1" id="test1" value="1" />
</div>
<input type="text" name="test2" id="test2" value="12345" />
</form>
Now I see that when I select the <form>
element, I don't get any children back, hence why I couldn't select the <input>
elements.
Add-Type -Path "C:\Program Files (x86)\HtmlAgilityPack\HtmlAgilityPack.dll"
$HTMLDocument = New-Object HtmlAgilityPack.HtmlDocument
$HTMLDocument.Load("C:\users\smithj\Desktop\test2.html")
$inputNodes=$HTMLDocument.DocumentNode.SelectNodes("//form")
$inputNodes
# Output shortened to show important bits ...
ChildNodes : {}
HasChildNodes : False
You can see that HasChildNodes
is equal to false.
From the C# link I provided, I somehow need to run HtmlNode.ElementsFlags.Remove("form");
but I can't figure out what to type into PowerShell that would be equivalent.
Thanks again!
Thanks to har07 for pointing me in the right direction. [HtmlAgilityPack.HtmlNode]::ElementsFlags.Remove("form")
was what I needed to run.
Note that I need to run that command before I load in the HTML.
> Add-Type -Path ".\Net40\HtmlAgilityPack.dll"
> [HtmlAgilityPack.HtmlNode]::ElementsFlags.Remove("form")
True
>
> $HTMLDocument = New-Object HtmlAgilityPack.HtmlDocument
> $HTMLDocument.Load(".\file.html")
> $HTMLDocument.DocumentNode.SelectNodes("//form")
# Output shortened to show important bits ...
ChildNodes : {#text, div, #text, input...}
HasChildNodes : True
OuterHtml : <form method="POST" action="post.aspx" id="form">
<div>
<input type="hidden" name="test1" id="test1" value="1">
</div>
<input type="text" name="test2" id="test2" value="12345">
</form>
Actually I'm not a user of PowerShell, but according to this blog post, you may want to try something like this :
[HtmlAgilityPack.HtmlNode.ElementsFlags]::Remove("form")