Search code examples
regexgoogle-sheetsweb-scrapingimportgoogle-sheets-formula

IMPORTDATA in Google Sheets and XPath


On GoogleSheet I need to crawl the Amazon's share's price (following xPath) on this page : https://www.boursorama.com/cours/AMZN/

//*[@id="main-content"]/div/section[1]/header/div/div/div[1]/div[1]/div/div[1]/span[1]

I can do it using this formula and it works :

IMPORTXML("https://www.boursorama.com/cours/AMZN/", "//*[@id=""main-content""]/div/section[1]/header/div/div/div[1]/div[1]/div/div[1]/span[1]")

BUT there is a daily limitation of IMPORTXML function.

So to avoid this Google's daily limitation I need to use this elegant method so it should be something like this :

=REGEXEXTRACT(QUERY(TRANSPOSE(IMPORTDATA(
 "https://www.boursorama.com/cours/AMZN/"));
 "where Col1 contains 'basp:""'");"(\d+.*)""") <-- here is the line where something is wrong

I'm not used to work with REGEX, does someone can help me on it ?


Solution

  • try:

    =REGEXREPLACE(QUERY(FLATTEN(IMPORTDATA(
     "https://www.boursorama.com/cours/AMZN")),
     "where Col1 contains 'data-ist-bid-price>'", 0), 
     "</?\S+[^<>]*>", )*1
    

    enter image description here