Search code examples
powershellurldownloadhrefinvoke-webrequest

How to get value of a link from a website with powershell?


I want to get download URL of the last version of GIMP from it's site ,I wrote a script but it returns the link name I do not know how to get the value

$web = Invoke-WebRequest -Uri "https://download.gimp.org/pub/gimp/v2.10/windows/"
$web.Links | Where-Object href -like '*exe' | select -Last 1 | select -expand href  

the above code returne link name (gimp-2.10.32-setup.exe) but I need the value ("https://download.gimp.org/pub/gimp/v2.10/windows/gimp-2.10.32-setup.exe") can someone guide me how to do it


Solution

  • There are several downloads sites with exactly the same or very similar layout to this GIMP page, including many Apache projects like Tomcat and ActiveMQ. I had written a little function to parse these and other pages in the past, and interestingly it also worked for this GIMP page. I thought it was worth sharing as such.

    Function Extract-FilenameFromWebsite {
        [cmdletbinding()]
        Param(
            [parameter(Position=0,ValueFromPipeline)]
            $Url
        )
    
        begin{
            $pattern = '<a href.+">(?<FileName>.+?\..+?)</a>\s+(?<Date>\d+-.+?)\s{2,}(?<Size>\d+\w)?'
        }
    
        process{
            $website = Invoke-WebRequest $Url -UseBasicParsing
    
            switch -Regex ($website.Content -split '\r?\n'){
                $pattern {
                    [PSCustomObject]@{
                        FileName     = $matches.FileName
                        URL          = '{0}{1}' -f $Url,$matches.FileName
                        LastModified = [datetime]$matches.Date
                        Size         = $matches.Size
                    }
                }
            }
        }
    }
    

    It's assumed the site passed in has a trailing slash. If you want to account for either, you can add this simple line to the process block.

    if($Url -notmatch '/$'){$Url = "$Url/"}
    

    To get the latest version, call the function like this

    $url = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
    
    $latest = Extract-FilenameFromWebsite -Url $Url | Where-Object filename -like '*exe' |
        Sort-Object LastModified | Select-Object -Last 1
    
    $latest.url
    

    Or you could expand the property while retrieving

    $url = 'https://download.gimp.org/pub/gimp/v2.10/windows/'
    
    $latesturl = Extract-FilenameFromWebsite -Url $Url | Where-Object filename -like '*exe' |
        Sort-Object LastModified | Select-Object -Last 1 -ExpandProperty URL
    
    $latesturl