Search code examples
htmlgoogle-sheetsweb-scrapingxpathgoogle-sheets-formula

How to grab an <img src> URL using IMPORTXML and Xpath in Google Sheets?


Trying to grab the URLs or URL snippets of images from a webpage using Google Sheets IMPORTXML function. I'm fairly sure I have the Xpath right, but I either get nothing or a "that data can't be parsed" - and yet I've seen other examples here of people grabbing URLs in Google Sheets this way, though in slightly different circumstances, none of which seem to be applicable here.

Trying to grab from this page - this is the relevant HTML:

<div id="product_image" class="A_ProductImg">
        <div class="bx-wrapper" style="max-width: 100%;"><div class="bx-viewport" style="width: 100%; overflow: hidden; position: relative; height: 540px;"><ul class="A_ProductImgSlider" style="width: auto; position: relative;">
         <li class="A_ItemList" style="float: none; list-style: none; position: absolute; width: 540px; z-index: 50; display: block;">
          <div class="image A_ItemImg A_SquareOuter">
           <img src="/ec/img/D3-64I011012_M_s.jpg" onerror="this.src='/ec/images/common/NoImage.gif'" alt="main product image" id="mainImage" class="A_ItemProductImg A_Square">
          </div>
         </li>
         <li class="A_ItemList" style="float: none; list-style: none; position: absolute; width: 540px; z-index: 0; display: none;"><div class="image A_ItemImg A_SquareOuter"><img src="/ec/img/D3-64I011012_S_1m.jpg" alt="product image 1" class="A_ItemProductImg A_Square"></div></li>
         <li class="A_ItemList" style="float: none; list-style: none; position: absolute; width: 540px; z-index: 0; display: none;"><div class="image A_ItemImg A_SquareOuter"><img src="/ec/img/D3-64I011012_S_2m.jpg" alt="product image 2" class="A_ItemProductImg 

I've made the following query to try and work with the subsequent HTML:

=IMPORTXML(A2,"//*[@id='product_image']/div[1]/div[1]/ul/li[1]/div/img src")

A2 having the relevant URL.

I think the Xpath is correct there, but not sure why it won't give me the result I'm looking for. I've played around with it a bit, but no luck.


Solution

  • How about this answer? Please think of this as just one of several answers.

    Sample formulas:

    In this case, https://www.mikigakki.com/ec/pro/disp/H/D3-64I011012?sFlg=2 is put in the cell "A1".

    Pattern 1:

    =IMPORTXML(A1,"//img/@src")
    

    enter image description here

    Pattern 2:

    =IMPORTXML(A1,"//li//@src")
    

    enter image description here

    Pattern 3:

    =IMPORTXML(A1,"//li[position()>1]//@src")
    

    enter image description here

    Pattern 4:

    =ARRAYFORMULA("https://www.mikigakki.com"&IMPORTXML(A1,"//li[position()>1]//@src"))
    

    enter image description here

    Pattern 5:

    =ARRAYFORMULA("https://www.mikigakki.com"&IMPORTXML(A1,"//li[1]//@src"))
    

    enter image description here

    Note:

    • If you want to retrieve the 1st image url using *[@id='product_image'], you can also use the following formula.

      =IMPORTXML(A1,"//*[@id='product_image']/ul/li[1]/div/img/@src")
      

    Reference:

    If I misunderstood your question and this was not the result you want, I apologize.