Search code examples
google-sheetsxpathgoogle-sheets-formula

I can import just first ten items with importxml in google sheet


I'm trying to import titles from a website with IMPORTXML function in google sheet but it is import just the first 10 from list, but what i need is to import all titles list

this is my formula:

=IMPORTXML("https://erej.org/category/%d9%82%d8%b3%d9%85-%d8%a7%d9%84%d8%b9%d8%a8%d8%a7%d8%af%d8%a7%d8%aa/%d8%a8%d8%a7%d8%a8-%d8%a7%d9%84%d8%b7%d9%87%d8%a7%d8%b1%d8%a9/","//html//h2")

Solution

  • Upon checking on the url you have provided, what you are trying to pull is webpage that is dynamically created, specifically, a javascipt-rendered website which is a limitation of importxml function():Limitations of IMPORTXML that means, technically, you can only pull those titles that are already shown, thus, you can only get the 10 titles.

    Javascript-rendered websites are not supported: this automatically excludes a large amount of websites, as it is common for popular and large websites to be rendered in Javascript.

    To confirm if the contents are dynamically added, you may checkout @Ruben's answer here: How to know if Google Sheets IMPORTDATA, IMPORTFEED, IMPORTHTML or IMPORTXML functions are able to get data from a resource hosted on a website?

    Content added dynamically

    To check if the content is added dynamically, using Chrome,

    1. Open the URL of the source data.
    2. Press F12 to open Chrome Developer Tools
    3. Press Control+Shift+P to open the >Command Menu.
    4. Start typing javascript, select Disable JavaScript, and then press Enter to run the command. JavaScript is now disabled.

    JavaScript will remain disabled in this tab so long as you have DevTools open.

    Reload the page to see if the content that you want to import is shown, if it's shown it could be imported by using Google Sheets built-in functions, otherwise it's not possible but might be possible by using other means for doing web scraping.

    Another way to disable it is through web browsing settings:

    1. Go to settings, Find Privacy and Security.
    2. Click Site Settings
    3. Go to Content Section
    4. Click Javascript and tick Don't allow sites to use Javascript
    5. Reload webpage

    Note: I am not affiliated with the website/reference, I just found it through research.