Search code examples
performancepowershellpowershell-3.0

Why is this PowerShell code (Invoke-WebRequest / getElementsByTagName) so incredibly slow on my machines, but not others?


I wrote some screen-scraping code in PowerShell and was surprised that it took around 30 seconds to parse a few HTML tables. I stripped it down to try and figure out where all the time was being spent, and it seems to be in the getElementsByTagName calls.

I've included a script below which on both my home desktop, my work desktop and my home slate, takes around 1-2 seconds for each iteration (full results pasted below). However, other people in the PowerShell community are reporting far shorter times (only several milliseconds for each iteration).

I'm struggling to find any way of narrowing down the problem, and there doesn't seem to be a pattern to the OS/PS/.NET/IE versions.

The desktop I'm currently running it on is a brand new Windows 8 install with only PS3 and .NET 4.5 installed (and all Windows Update patches). No Visual Studio. No PowerShell profile.

$url = "http://www.icy-veins.com/restoration-shaman-wow-pve-healing-gear-loot-best-in-slot"
$response = (iwr $url).ParsedHtml

# Loop through the h2 tags
$response.body.getElementsByTagName("h2") | foreach {

    # Get the table that comes after the heading
    $slotTable = $_.nextSibling

    # Grab the rows from the table, skipping the first row (column headers)
    measure-command { $rows = $slotTable.getElementsByTagName("tr") | select -Skip 1 } | select TotalMilliseconds
}

Results from my desktop (the work PC and slate give near identical results):

TotalMilliseconds
-----------------
        1575.7633
        2371.5566
        1073.7552
        2307.8844
        1779.5518
        1063.9977
        1588.5112
        1372.4927
        1248.7245
        1718.3555
         3283.843
        2931.1616
        2557.8595
        1230.5093
         995.2934

However, some people in the Google+ PowerShell community reported results like this:

 TotalMilliseconds
 -----------------
           76.9098
          112.6745
           56.6522
          140.5845
           84.9599
           48.6669
           79.9283
           73.4511
           94.0683
           81.4443
           147.809
          139.2805
          111.4078
           56.3881
           41.3386

I've tried both PowerShell ISE and a standard console, no difference. For the work being done, these times seem kinda excessive, and judging by the posts in the Google+ community, it can go quicker!


Solution

  • See my comment in: https://connect.microsoft.com/PowerShell/feedback/details/778371/invoke-webrequest-getelementsbytagname-is-incredibly-slow-on-some-machines#tabs

    I got the same slowness running the script in 64 bits, but when running in 32bits mode, everything is very fast !

    Lee Holmes was able to reproduce the issue, and here is his writeup

    "The issue is that he’s piping COM objects into another cmdlet – in this case, Select-Object. When that happens, we attempt to bind parameters by property name. Enumerating property names of a COM object is brutally slow – so we’re spending 86% of our time on two very basic CLR API calls:

    (…) // Get the function description from a COM type typeinfo.GetFuncDesc(index, out pFuncDesc); (…) // Get the function name from a COM function description typeinfo.GetDocumentation(funcdesc.memid, out strName, out strDoc, out id, out strHelp); (…)

    We might be able to do something smart here with caching.

    A workaround is to not pipe into Select-Object, but instead use language features:

    # Grab the rows from the table, skipping the first row (column headers)
    $allRows = @($slotTable.getElementsByTagName("tr"))
    $rows = $allRows[1..$allRows.Count]
    

    "