Search code examples
javascriptxpathweb-scrapingmacrosimacros

How to access HTML DOM Property using iMacros - xPath


iMacros ver: 10.0.2.1450 (FREE), Firefox, WIndow 10

Hello, The objective is to extract the value of HTML DOM Property such as id,href and data-download-file-url for each of the images displayed from this website. I believe XPATH will be suitable for this task as each of the image can be accessed by the following generalise XPATH

/html/body/main/section[2]/div/div/figure[X]/div

with the capital X indicate the Image label that take the value from 1 to 50, for the aforementioned website.

I know that, to extract the properties of Figure 1, for example, can be achieved by

TAG XPATH="/html/body/main/section[2]/div/div/figure[1]"  EXTRACT=TXT

However, the line above outputted all DOM Property including the one that I am not interested with.

According to the tutorial below;

[OP1][https://forum.imacros.net/viewtopic.php?t=26155] [OP2][How to extract specific text with imacros xpath

Extracting specific DOM property can be achieved by something like the following

TAG XPATH="/html/body/main/section[2]/div/div/figure[1]/div[@id='showcase__content'] "  EXTRACT=TXT

However, the execution instead give an error.

I really appreciate if someone can shed some light about this problem.

Example of the DOM property for Figure 1. The properties are all in pink color. https://drive.google.com/open?id=190q615C3uXLZUQNI8K4AJYL3Slii1ktO


Solution

  • Your XPath contains an error (@id instead of @class). Fix it with :

    //figure[1]/div[@class='showcase__content']
    

    To access the url for downloading the file, it would be :

    //figure[1]/div[@class='showcase__content']//@data-download-file-url
    

    EDIT : To get values from specific attributes you have to extract the code from the element with the HTM function and then use regex. HREF attributes can be extracted directly.

    I'm not an imacros user, so my code might not be the smartest :

    VERSION BUILD=1005 RECORDER=CR
    URL GOTO=https://www.freepik.com/search?dates=any&format=search&page=1&query=Polygonal%20Human&sort=popular
    TAG XPATH="//figure[1]/div[@class='showcase__content']/a" EXTRACT=HREF
    SET !VAR3 {{!EXTRACT}}
    TAG XPATH="//figure[1]/div[@class='showcase__content']/a" EXTRACT=HTM
    SET !VAR1 EVAL("var regex = /url=\"(.+?)\"/; var str = '{{!EXTRACT}}';str.match(regex)[1];")
    SET !VAR2 EVAL("var regex = /id=\"(.+?)\"/; var str = '{{!EXTRACT}}';str.match(regex)[1];")
    PROMPT {{!VAR1}}
    PROMPT {{!VAR2}}
    PROMPT {{!VAR3}}
    

    Side notes : free users of imacros are limited to 3 declared variables (!VAR1 to 3). You might need loops and SET !EXTRACT_TEST_POPUP NO to achieve your final goal.