Search code examples
powershellhtml-parsingpowershell-5.0

How to Parse HTML in Powershell 5 with -UseBasicParsing


Often Invoke-WebRequest is preferable in usage with the -UseBasicParsing parameter, if not using Invoke-RestMethod command for both performance and network savings.

But the results from these don't have the good ol' PARSEHTML method.

How can we parse html using the stated command setups?

  1. Invoke-Webrequest $site -UseBasicParsing
  2. Invoke-RestMethod $site

Solution

  • The scenario can be solved by creating a new HTML object and writing to its IHTMLDocument2 section

    NOTE: THIS IS ONLY VALID IN WINDOWS POWERSHELL 5.0 and 5.1

    You can deal with the listed scenarios as follows:

    1. For Invoke-Webrequest $site -UseBasicParsing

      $html = new-object -ComObject "HTMLFile; $html.IHTMLDocument2_write($site.rawcontent)

    2. For Invoke-RestMethod $site

      $html = new-object -ComObject "HTMLFile; $html.IHTMLDocument2_write($site)

    Now you can parse like normal for example getting an element by id

    $button = $html.getElementById('button')