Search code examples
web-scrapingxpathgoogle-sheets-formula

the cell appears empty in Google Sheet (importxml)


When using the importxml formula to import the price from the site into Google Sheet, the cell appears empty in Google Sheet These are my attempts below

xpath =IMPORTXML("https://tamkeenstores.com.sa/ar/product/general-supreme-window-ac-18000-btu-cool-rotary-gs1834c","//*[@id='__next']/section[6]/div/div/div[2]/div[1]/h2")

fullxpath =IMPORTXML("https://tamkeenstores.com.sa/ar/product/general-supreme-window-ac-18000-btu-cool-rotary-gs1834c","//*[@id='__next']/section[6]/div/div/div[2]/div[1]/h2")


Solution

  • It looks like the website is using JavaScript to show the price details. And IMPORTXML can't get data from this type of source. So, we have to get the price details from another source

    Below formula will get the price details

    =REGEXEXTRACT(INDEX(IMPORTXML("https://tamkeenstores.com.sa/ar/product/general-supreme-window-ac-18000-btu-cool-rotary-gs1834c","//script[contains(@type, 'application/ld+json')]"), 1), "price\W+(\d+)")
    

    Explanation:

    In the website, we can find the price 1379 in other places also like in the script tag. We can use the IMPORTXML to get the text in script tag

    
    IMPORTXML("https://tamkeenstores.com.sa/ar/product/general-supreme-window-ac-18000-btu-cool-rotary-gs1834c","//script[contains(@type, 'application/ld+json')]")
    

    As there are 2 script tags, i am using INDEX to get the data from 1st script tag only

    1 - to get the 1st row value only

    INDEX(IMPORTXML("https://tamkeenstores.com.sa/ar/product/general-supreme-window-ac-18000-btu-cool-rotary-gs1834c","//script[contains(@type, 'application/ld+json')]"), 1)
    

    Now, we have some unnecessary data returned from the above formula. So, I am using REGEXEXTRACT to get the price value only

    "price\W+(\d+)" - this regex will match the string "price":"1379.00
    price - matches the price string
    \W+ - non-word characters like ":"
    \d+ - matches the numbers
    (\d+) - matches the numbers and captures them
    
    =REGEXEXTRACT(INDEX(IMPORTXML("https://tamkeenstores.com.sa/ar/product/general-supreme-window-ac-18000-btu-cool-rotary-gs1834c","//script[contains(@type, 'application/ld+json')]"), 1), "price\W+(\d+)")