Search code examples
powershellhtml-agility-pack

HTMLagilityPack in combination with Powershell, Windows authentification


So i have a tool called lansweeper. It runs on a local server. Now i want to scrape a page from it , but it uses windows authentication. I use Powershell as the scripting language. I mainly use HTMLAgilityPack to scrape. But i've never scraped a page that uses windows authentification.

Does anyone know how i pass my credentials with it? So that it opens the page under certain credentials? (like my administrator account instead of my normal one). (Yes i could add my normal user to the allowed users in Lansweeper but that's not a solution i'd like to use).

I've tried the following but it doesn't work.

[Reflection.Assembly]::LoadFile("C:\Scraping\HtmlAgilityPack\lib\Net45\HtmlAgilityPack.dll”)
[HtmlAgilityPack.HtmlWeb]$web = @{}
$webclient = new-object System.Net.WebClient
$username = "user"
$password = "passw0rd-"
$domain = "mydomain"
$webclient.Credentials = new-object System.Net.NetworkCredential($username, $password, $domain)
[HtmlAgilityPack.HtmlDocument]$doc = $web.Load("http://lansweeper:81/user.aspx?username=sam&userdomain=mydomain","","",$webclient.Credentials) 
[HtmlAgilityPack.HtmlNodeCollection]$nodes = $doc.DocumentNode.SelectNodes("//body")

I have been looking into the functions and came across two possibilities :

TypeName   : HtmlAgilityPack.HtmlWeb
Name       : Load
HtmlAgilityPack.HtmlDocument Load(string url), 
HtmlAgilityPack.HtmlDocument Load(string url, string proxyHost, int proxyPort, string userId, string password), 
HtmlAgilityPack.HtmlDocument Load(string url, string method), 
HtmlAgilityPack.HtmlDocument Load(string url, string method, System.Net.WebProxy proxy, System.Net.NetworkCredential credentials)

Name       : Get
MemberType : Method
void Get(string url, string path), 
void Get(string url, string path, System.Net.WebProxy proxy, System.Net.NetworkCredential credentials), 
void Get(string url, string path, string method), 
void Get(string url, string path, System.Net.WebProxy proxy, System.Net.NetworkCredential credentials, string method)

But i can't get one of them to work. Anyone ever did this with Powershell?


Solution

  • I found how to do it: i hope it helps someone in the future. It wasn't straight forward to figure out but it's easy once you see it.

    [Reflection.Assembly]::LoadFile("C:\temp\HtmlAgilityPack\lib\Net45\HtmlAgilityPack.dll") | Out-Null
    [HtmlAgilityPack.HtmlWeb]$web = @{}
    $url = "http://lansweeper:81/user.aspx?username=sam&userdomain=mydomain"
    $webclient = new-object System.Net.WebClient
    
        $cred = new-object System.Net.NetworkCredential
        $defaultCredentials =  $cred.UseDefaultCredentials
    
    $proxyAddr = (get-itemproperty 'HKCU:\Software\Microsoft\Windows\CurrentVersion\Internet Settings').ProxyServer
    $proxy = new-object System.Net.WebProxy
    $proxy.Address = $proxyAddr
    $proxy.useDefaultCredentials = $true 
    $proxy
    
    [HtmlAgilityPack.HtmlDocument]$doc = $web.Load($url,"GET","$proxy",$defaultCredentials ) 
    [HtmlAgilityPack.HtmlNodeCollection]$nodes = $doc.DocumentNode.SelectNodes("//html[1]/body[1]")
    
    $nodes
    
    <# USER RESOURCES
    https://msdn.microsoft.com/en-us/library/system.net.webclient.usedefaultcredentials(v=vs.110).aspx
    https://forums.asp.net/t/2027997.aspx?HtmlAgilityPack+Stuck+trying+to+understand+HtmlWeb+Load+NetworkCredential
    https://msdn.microsoft.com/en-us/library/system.net.webclient.usedefaultcredentials.aspx
    https://stackoverflow.com/questions/571429/powershell-web-requests-and-proxies
    
    TypeName   : HtmlAgilityPack.HtmlWeb
    Name       : Load
    HtmlAgilityPack.HtmlDocument Load(string url, string proxyHost, int proxyPort, string userId, string password), 
    HtmlAgilityPack.HtmlDocument Load(string url, string method, System.Net.WebProxy proxy, System.Net.NetworkCredential credentials)
    #>