Search code examples
stringpowershell

Find specific sentence in a web page using powershell


I need to use powershell to resolve IP addresses via whois. My company filters port 43 and WHOIS queries so the workaround I have to use here is to ask powershell to use a website such as https://who.is, read the http stream and look for the Organisation Name matching the IP address.

So far I have managed to get the webpage read into powershell (example here with a WHOIS on yahoo.com) which is https://who.is/whois-ip/ip-address/206.190.36.45

So here is my snippet:

$url=Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45

now if I do :

$url.gettype()
IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     False    HtmlWebResponseObject                    Microsoft.PowerShell.Commands.WebResponseObject

I see this object has several properties:

Name              MemberType Definition
----              ---------- ----------
Equals            Method     bool Equals(System.Object obj)
GetHashCode       Method     int GetHashCode()
GetType           Method     type GetType()
ToString          Method     string ToString()
AllElements       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection AllElements {get;}
BaseResponse      Property   System.Net.WebResponse BaseResponse {get;set;}
Content           Property   string Content {get;}
Forms             Property   Microsoft.PowerShell.Commands.FormObjectCollection Forms {get;}
Headers           Property   System.Collections.Generic.Dictionary[string,string] Headers {get;}
Images            Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Images {get;}
InputFields       Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection InputFields {get;}
Links             Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Links {get;}
ParsedHtml        Property   mshtml.IHTMLDocument2 ParsedHtml {get;}
RawContent        Property   string RawContent {get;}
RawContentLength  Property   long RawContentLength {get;}
RawContentStream  Property   System.IO.MemoryStream RawContentStream {get;}
Scripts           Property   Microsoft.PowerShell.Commands.WebCmdletElementCollection Scripts {get;}
StatusCode        Property   int StatusCode {get;}
StatusDescription Property   string StatusDescription {get;}

but every time I try commands like

$url.ToString() | select-string "OrgName"

Powershell returns the whole HTML code because it interprets the text string as a whole. I found a workaround dumping the output into a file and then read the file through an object (so every line is an element of an array) but I have hundreds of IPs to check so that's not very optimal to create a file all the time.

I would like to know how I could read the content of the web page https://who.is/whois-ip/ip-address/206.190.36.45 and get the line that says : OrgName: Yahoo! Broadcast Services, Inc.

and just that line only.

Thanks very much for your help! :)


Solution

  • There are most likely better ways to parse this but you were on the right track with you current logic.

    $web = Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45
    $web.tostring() -split "[`r`n]" | select-string "OrgName"
    

    Select-String was returning the match as it, previously, was one long string. Using -split we can break it up to just get the return you expected.

    OrgName:        Yahoo! Broadcast Services, Inc.
    

    Some string manipulation after that will get a cleaner answer. Again, many ways to approach this as well

    (($web.tostring() -split "[`r`n]" | select-string "OrgName" | Select -First 1) -split ":")[1].Trim()
    

    I used Select -First 1 as select-string could return more than one object. It would just ensure we are working with 1 when we manipulate the string. The string is just split on a colon and trimmed to remove the spaces that are left behind.

    Since you are pulling HTML data we could also walk through those properties to get more specific results. The intention of this was to get 1RedOne answer

    $web = Invoke-WebRequest https://who.is/whois-ip/ip-address/206.190.36.45
    $data = $web.AllElements | Where{$_.TagName -eq "Pre"} | Select-Object -Expand InnerText
    $whois = ($data -split "`r`n`r`n" | select -index 1) -replace ":\s","=" | ConvertFrom-StringData
    $whois.OrgName
    

    All that data is stored in the text of the PRE tag in this example. What I do is split up the data into its sections (Sections are defined with blank lines separating them. I look for consecutive newlines). The second group of data contains the org name. Store that in a variable and pull the OrgName as a property: $whois.OrgName. Here is what $whois looks like

    Name                           Value                                                                                                                         
    ----                           -----                                                                                                                         
    Updated                        2013-04-02                                                                                                                    
    City                           Sunnyvale                                                                                                                     
    Address                        701 First Ave                                                                                                                 
    OrgName                        Yahoo! Broadcast Services, Inc.                                                                                               
    StateProv                      CA                                                                                                                            
    Country                        US                                                                                                                            
    Ref                            http://whois.arin.net/rest/org/YAHO                                                                                           
    PostalCode                     94089                                                                                                                         
    RegDate                        1999-11-17                                                                                                                    
    OrgId                          YAHO
    

    You can also make that hashtable into a custom object if you prefer dealing with those.

    [pscustomobject]$whois
    
    Updated    : 2017-01-28
    City       : Sunnyvale
    Address    : 701 First Ave
    OrgName    : Yahoo! Broadcast Services, Inc.
    StateProv  : CA
    Country    : US
    Ref        : https://whois.arin.net/rest/org/YAHO
    PostalCode : 94089
    RegDate    : 1999-11-17
    OrgId      : YAHO