Search code examples
google-sheetsweb-scrapingimportgoogle-sheets-formulasequence

How to use IMPORTXML and SEQUENCE together in Google Sheet


=ARRAYFORMULA("https://www.amazon.com/product-reviews/B08C1W5N87/ref=cm_cr_arp_d_viewopt_rvwer?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber="&SEQUENCE(5,1,1,1))

I use the code above to have the links that I would like to scrap the data. There are 5 links.

=IMPORTXML(A6,"/html/body/div[1]/div[3]/div/div[1]/div/div[1]/div[5]/div[3]/div/div[*]/div/div/div[2]/a[1]/i")

I also use the formula above to scrap the data I want from the link. A6 refers to the first link the first formula creates.

What I would like to do is, if possible, I want to scrap the data from the 5 links and list them in a column.

=IMPORTXML(ARRAYFORMULA("https://www.amazon.com/product-reviews/B08C1W5N87/ref=cm_cr_arp_d_viewopt_rvwer?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber="&SEQUENCE(5,1,1,1)),"/html/body/div[1]/div[3]/div/div[1]/div/div[1]/div[5]/div[3]/div/div[*]/div/div/div[2]/a[1]/i")

The formula above did not work.

=ARRAYFORMULA(IMPORTXML("https://www.amazon.com/product-reviews/B08C1W5N87/ref=cm_cr_arp_d_viewopt_rvwer?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber="&SEQUENCE(5,1,1,1),"/html/body/div[1]/div[3]/div/div[1]/div/div[1]/div[5]/div[3]/div/div[*]/div/div/div[2]/a[1]/i"))

The formula above did not work as well. It always scraps the first link's data only.

Thank you for your help in advance.


Solution

  • keep in mind that IMPORTXML itself is a "type of arrayformula" so it is not supported under ARRAYFORMULA

    in your case try to hardcode 5 IMPORTRANGE formulae into array {} like:

    ={IMPORTRANGE(); 
      IMPORTRANGE(); 
      IMPORTRANGE(); 
      etc}
    

    update

    with new LAMBDA function its possible to do it in one go:

    =INDEX(TRIM(FLATTEN(SPLIT(FLATTEN(BYCOL(
     "https://www.amazon.com/product-reviews/B08C1W5N87/ref=cm_cr_arp_d_viewopt_rvwer?ie=UTF8&reviewerType=avp_only_reviews&sortBy=recent&pageNumber="&
     SEQUENCE(1,5,1,1), LAMBDA(x, QUERY(IMPORTXML(x, 
     "/html/body/div[1]/div[3]/div/div[1]/div/div[1]/div[5]/div[3]/div/div[*]/div/div/div[2]/a[1]/i")&"×",,9^9)))), "×"))))
    

    enter image description here