Search code examples
regexgoogle-chromecmdvbscript

Download HTML page source using VBS - Chrome - CMD and regex


I am trying to download HTML source code using VBS & Chrome, save it to local disk and extract the text between two words (startString - endString) using regex. This is what I have:

'run cmd command
Set oShell = WScript.CreateObject("WScript.Shell")
oShell.Run "cmd c: & cd Program Files\Google\Chrome\Application>chrome.exe --headless --dump-dom --enable-logging --disable-gpu https://google.com >C:\temp\source.txt"
'read txt
Dim objFile, fso
Set fso = CreateObject("Scripting.FileSystemObject")
Set objFile = fso.OpenTextFile("C:\temp\source.txt", ForReading)
'RegEx 
Dim objRegExp
Set objRegExp = New RegExp 'Set our pattern
objRegExp.Pattern = "(^.*;startString=)(.*)(;endString.*)"
objRegExp.IgnoreCase = True
objRegExp.Global = True 
Do Until objFile.AtEndOfStream 
 strSearchString = objFile.ReadLine
 Dim objMatches
 Set objMatches = objRegExp.Execute(strSearchString)
 If objMatches.Count > 0 Then
  out = out & objMatches(0) &vbCrLf
  WScript.Echo "found"
 End If
Loop
WScript.Echo out
objFile.Close

Issue 1: I have problems with CMD and VBS, if I open the console,navigate to C: and Chrome.exe the command is working fine. Issue2: out Echo always empty


Solution

  • The code can be simplified by using InStr, instead of RegExp. Additionally, as indicated in the comments, the Run command in the original code is missing /c, unnecessarily changes directory, and is missing bWaitOnReturn. Also note that WScript.Echo will fail to show anything if the string to be displayed exceeds 64K. MsgBox will always show the first 1023 characters of the string. Here's the code rewritten to use InStr:

    Set oWSH = WScript.CreateObject("WScript.Shell")
    oWSH.Run "Cmd.exe /c ""C:\Program Files\Google\Chrome\Application\chrome.exe"" --headless --dump-dom --enable-logging --disable-gpu https://google.com >C:\temp\source.txt",,True
    Set oFSO = CreateObject("Scripting.FileSystemObject")
    Contents = oFSO.OpenTextFile("C:\temp\source.txt").ReadAll
    StartString = "https://store.google.com"
    EndString = "https://mail.google.com"
    StartPos = InStr(Contents,StartString)
    FoundText = ""
    If StartPos>0 Then
      EndPos = InStr(StartPos,Contents,EndString)
      If EndPos > StartPos Then FoundText = Mid(Contents,StartPos,EndPos-StartPos+Len(EndString))
    End If
    WScript.Echo FoundText