Search code examples
powershellscreen-scraping

Determine if webpage has content using PowerShell


I have created a PHP page that does a basic select against a table to determine the last time data has been inserted by a particular agent. The purpose of the page is to see if any agent has not submitted data in the past 48 hours. Only agents who have not submitted data within that time period will show up on the list. As I don't expect there to be entries in this table 95% of the time, I need to set up an alert to only send if data exists on that PHP page.

I was trying a PowerShell script to scrape the page, and that is working well... I just need to figure out how to: 1) Scrape Page 2) If content exists -> send email 3) ELSE -> close.

I would schedule it through the standard Windows scheduled tasks. I know there are easier, or more straight forward, ways to do this, but I don't have the option to enable mail on the linux web server...

Below is my screen scrape tool:

$web = New-Object Net.WebClient
$web | Get-Member

$web.DownloadString("http://www.bing.com")

I got the code from: http://learn-powershell.net/2011/02/11/using-powershell-to-query-web-site-information/

Any ideas?


Solution

  • 1) Scrape Page

    You're off to a good start. The DownloadString method will download the HTML.

    2) If content exists -> send email

    Depends on the content you are looking for. You can use the comparison operators -match or -like or string methods Contains() to test. Then put into an if/else block e.g.

    $string = $web.DownloadString("http://somewebsite")
    if ($string -match "regex_here") {
        Send-MailMessage -SmtpServer your_server -To to.address@domain -From from.address@domain -Subject foo -Body bar
    } else {
        # not necessary unless there is something you want to execute here.
    }
    

    Notice I used the Send-MailMessage cmdlet to send the email.

    3) ELSE -> close.

    See above.