I'm trying to scrape search result counter from Google SERP. It works with Google Spreadsheets, ImportXML
and RegExReplace
, but not always, because of Spreadsheets fault. So i'm trying to accomplish it with iMacros and can't get scraped string correctly filtered out.
In G Spreadsheets i use
=REGEXREPLACE(IMPORTXML("https://www.google.com/search?q=test&hl=en&as_qdr=m","//div[@id='resultStats']"),".*?([0-9,]+) (w|r)esults?","$1")
The whole imported string in the id="resultsStats"
is About 4,290,000 results
Here regex .*?([0-9,]+) (w|r)esults?
filters all words out so i get only results number. As i said, it doesn't work reliably in Spreadsheets.
The question is: how i use this RegEx with iMacros to get only number? I use this iMacros code:
VERSION BUILD=8881205 RECORDER=FX
SET !TIMEOUT_STEP 0
SET !ERRORIGNORE YES
TAB T=1
SET !DATASOURCE sr1.csv
SET !DATASOURCE_COLUMNS 1
SET !LOOP 1
SET !DATASOURCE_LINE {{!LOOP}}
SET !VAR1 EVAL("var randomNumber=Math.floor(Math.random()*45 + 16); randomNumber;")
URL GOTO={{!COL1}}
WAIT SECONDS={{!VAR1}}
TAG POS=1 TYPE=DIV ATTR=ID:resultStats EXTRACT=TXT
ADD !EXTRACT {{!URLCURRENT}}
SET !EXTRACT EVAL("decodeURI('{{!EXTRACT}}');")
SAVEAS TYPE=EXTRACT FOLDER=* FILE=+{{!NOW:ddmmyyyy}}.csv
It's very simple to do:
' ... '
TAG POS=1 TYPE=DIV ATTR=ID:resultStats EXTRACT=TXT
SET !EXTRACT EVAL("'{{!EXTRACT}}'.match(/[0-9,]+/);")
' ... '