Search code examples
windowspowershellweb-scrapinghtml-parsingpowershell-4.0

How to change multiple headers in a table using Powershell


I am trying to change multiple header names within my code that is pulling the Team Statistics table from this site I am unsure where to manually change them in my code.

For example, I tried manually changing header 8, GF to GFPG in the line where I add the 'TEAM' header, but I get the error:

Exception calling "Add" with "2" argument(s): "Item has already been added. Key in dictionary: 'GF' Key being added: 'GF'" At C:\NHLScraper.ps1:32 char:5 + $objHash.Add($headers[$j],$rowdata[$j])

My code:

$url = "https://www.hockey-reference.com/leagues/NHL_2020.html"

#getting the data
$data = Invoke-WebRequest $url

#grab the third table
$table = $data.ParsedHtml.getElementsByTagName("table") | Select -skip 2 | Select -First 1


#get the rows of the Team Statistics table
$rows = $table.rows

#get table headers
$headers = $rows.item(1).children | select -ExpandProperty InnerText

#count the number of rows
$NumOfRows = $rows | Measure-Object

#Manually injecting TEAM header
$headers = @($headers[0];'TEAM';$headers[1..($headers.Length-1)])

#enumerate the remaining rows (we need to skip the header row) and create a custom object
$out = for ($i=2;$i -lt $NumofRows.Count;$i++) {
 #define an empty hashtable
 $objHash=[ordered]@{}
 #getting the child rows
 $rowdata = $rows.item($i).children | select -ExpandProperty InnerText 
 for ($j=0;$j -lt $headers.count;$j++) {
    #add each row of data to the hash table using the correlated table header value
    $objHash.Add($headers[$j],$rowdata[$j])
  }

  #turn the hashtable into a custom object
  [pscustomobject]$objHash
}

$out | Select TEAM,AvAge,GP,W,L,OL,PTS,PTS%,GF,GA,SOW,SOL,SRS,SOS,TG/G,EVGF,EVGA,PP,PPO,PP%,PPA,PPOA,PK%,SH,SHA,PIM/G,oPIM/G,S,S%,SA,SV%,SO -SkipLast 1 | Export-Csv -Path "C:\$((Get-Date).ToString("'NHL Stats' yyyy-MM-dd")).csv" -NoTypeInformation

Solution

  • You can add a condition to check if the key has already been added and if so, update it or ignore it,

    if (!$objHash.Contains(headers[$j]))
        $objHash.Add($headers[$j],$rowdata[$j])
    
    else
        $objHash[$headers[$j]] = $rowdata[$j] # Overwrite values
    

    But after looking at your code a few times, this doesnt make sense,

    $out = for ($i=2;$i -lt $NumofRows.Count;$i++) {
     #define an empty hashtable
     $objHash=[ordered]@{}         # Overwritten each loop???
     #getting the child rows
     $rowdata = $rows.item($i).children | select -ExpandProperty InnerText 
     for ($j=0;$j -lt $headers.count;$j++) {
        #add each row of data to the hash table using the correlated table header value
        $objHash.Add($headers[$j],$rowdata[$j])   # Dictionary cannot have duplicate keys
      }
    
      #turn the hashtable into a custom object
      [pscustomobject]$objHash    # what do you do with this?
    }
    
    

    You are looping over x number of times and each time you are overwriting the $objHash. only thing that would be returned is whats created in the last loop.

    Suggested Solution

    You can use another variable to keep track of all the hashtables you are creating along with making sure duplicate keys are not inserted that would throw the exception.

    # If you want to change the header value from GF to GFPG, you can do that in the place you have defined $headers
    
    
    #get table headers
    $headers = $rows.item(1).children | select -ExpandProperty InnerText
    $headers = $headers | % { if ($_ -eq "GF") { "GFPG" }  else { $_ }} 
    
    #count the number of rows
    $NumOfRows = $rows | Measure-Object
    
    #Manually injecting TEAM header
    $headers = @($headers[0];'TEAM';$headers[1..($headers.Length-1)])
    
    #enumerate the remaining rows (we need to skip the header row) and create a custom object
    $allData = @{}
    $out = for ($i=2;$i -lt $NumofRows.Count;$i++) {
     #define an empty hashtable
     $objHash=[ordered]@{}
     #getting the child rows
     $rowdata = $rows.item($i).children | select -ExpandProperty InnerText 
     for ($j=0;$j -lt $headers.count;$j++) {
        #add each row of data to the hash table using the correlated table header value
        $objHash[$headers[$j]] = $rowdata[$j]        
      }
    
      #turn the hashtable into a custom object
      [pscustomobject]$objHash
      $allData.Add($i, $objHash)
    }
    

    I used $AllData with i as the key to store each of those results that can later be accessed.