Search code examples
node.jsweb-scrapingjsdomnode.js-got

Web Scraping NodeJs - How to recover resources when the page loads in full after several requests


i'm trying to retrieve each item (composed of an image, a word and its translation) from this page

Link of the website: https://livingdictionaries.app/hazaragi/entries/gallery?entries_prod%5Btoggle%5D%5BhasImage%5D=true"

enter image description here

I used JsDom and Got. Here is the code


const jsdom = require("jsdom");
const { JSDOM } = jsdom;
const got = require('got');


(async () => {
    const response = await got("https://livingdictionaries.app/hazaragi/entries/gallery?entries_prod%5Btoggle%5D%5BhasImage%5D=true");

    console.log(response.body);
    const dom = new JSDOM(response.body);
    console.log(dom.window.document.querySelectorAll(".ld-egdn1r"))
})();

when I display the html code that is returned to me it does not correspond to what I open the site with my browser.There are no html tags that contain the items.

When I look at the Network tab, other resources are loaded, but again I can't find the query that retrieves the words.

enter image description here

I think that what I am looking for is loaded in several queries but I don't know which one


Solution

  • Here are the step: enter image description here

    then you will get a code like that

    fetch("https://xcvbaysyxd-dsn.algolia.net/1/indexes/*/queries", {
        "credentials": "omit",
        "headers": {},
        "referrer": "https://livingdictionaries.app/",
        "body": "...",
        "method": "POST",
        "mode": "cors"
    });
    

    you will just have to process the data manualy after that

    const fetch = require("node-fetch") // npm i node-fetch
    const data = await fetch(...).then(r=>r.json())
    const product = data.results.map(r=>r.hits)
    

    in your case