Trying to fetch the table of content using h3 tag.
link- https://learn.microsoft.com/en-us/troubleshoot/sql/releases/download-and-install-latest-updates
Trying to fetch the whole table of 'SQL Server 2019'. and convrting into json object in powershell.
| Build number or version | Service pack | Update | ..
| 15.0.4298.1 | None | CU19 | ..
| 15.0.4280.7 | None | CU18 + GDR | ..
...
...........................Tried this
` Send a request to the URL and get the HTML content using basic parsing
$response = Invoke-WebRequest -Uri "https://learn.microsoft.com/en-us/troubleshoot/sql/releases/download-and-install-latest-updates" -UseBasicParsing
$response = $response.content
"Find the table which has an h3 tag containing "sql-server-2017""
$table = ($response.ParsedHtml).getElementsByTagName("h3") | Where-Object {$_.innerText -eq "SQL Server 2017"} | Select-Object -ExpandProperty parentNode | Select-Object -ExpandProperty nextSibling
Print the table content
Write-Output $table.outerHTML
but not working
The -UseBasicParsing
parameter means you won't have any ParsedHtml data. This is the default on powershell 6+. You can confirm by just inspecting the property
$response = Invoke-WebRequest -Uri "https://learn.microsoft.com/en-us/troubleshoot/sql/releases/download-and-install-latest-updates" -UseBasicParsing
$response
You can use a HTML parser with the html data you retrieve. There are even modules made just for this.
You can also use regular expressions for a quick and dirty solution. This is not recommended as it's cumbersome and fragile as the slightest change in the website can completely break your solution. Here is an example of a quick, and very dirty regex extraction.
$tablepattern = '(?s)(?<=/table>\r?\n)<h3.+?sql-server-2019.+?(?=<h3)'
$datapattern = '(?s)<tr.+?td>(?<Version>.+?)<.+?td>(?<ServicePack>[^<]+?)</.+?>(?<Update>[^<]+?)</.+?href="(?<KBArticle>.+?)".+?>(?<KBNumber>[^<]+?)<.+?<td>(?<ReleaseDate>.+?)</'
$response = Invoke-WebRequest -Uri "https://learn.microsoft.com/en-us/troubleshoot/sql/releases/download-and-install-latest-updates" -UseBasicParsing
$null = $response.RawContent -match $tablepattern
[string]$table = $matches.Values
$data = $table -split '(?=<tr>)' | ForEach-Object{
if($_ -match $datapattern){
$matches.Remove(0)
[PSCustomObject]$matches
}
}
$data | Format-Table
KBNumber Version ReleaseDate ServicePack KBArticle Update
-------- ------- ----------- ----------- --------- ------
5023049 15.0.4298.1 February 16, 2023 None https://support.microsoft.com/help/5023049 CU19
5021124 15.0.4280.7 February 14, 2023 None https://support.microsoft.com/help/5021124 CU18 + GDR
5021125 15.0.2101.7 February 14, 2023 None https://support.microsoft.com/help/5021125 GDR
5017593 15.0.4261.1 September 28, 2022 None https://support.microsoft.com/help/5017593 CU18
5016394 15.0.4249.2 August 11, 2022 None https://support.microsoft.com/help/5016394 CU17
5014353 15.0.4236.7 June 14, 2022 None https://support.microsoft.com/help/5014353 CU16 + GDR
...
All that's left is to convert it to json
$data | ConvertTo-Json