Search code examples
javascriptnode.jsjsonld

NodeJS - how to scrape ld+json data and save it to an object


I've been trying to find a way to get the apllication/ld+json contents and saving it to a local object. What I want to have is save it to an object, and in my program I would be able to console.log(data.offers.availability) which will result in logging: "InStock", and this for each of the data values.

I currently have this:

            let content = JSON.stringify($("script[type='application/ld+json']").html())
            let filteredJson = content.replace(/\\n/g, '')
            let results = JSON.parse(filteredJson)
            console.log(results)

Which results in this: - Doesn't let me console.log(results.offers.availability)

 {    "@context": "http://schema.org/", 
   "@type": "Product",    "name": "Apex Legends - Bangalore - Mini Epics",
    "description": "<div class="textblock"><p><h2>Apex Legends - Bangalore - Mini Epics </h2><p>Helden uit alle uithoeken van de wereld strijden voor eer, roem en fortuin in Apex Legends. Weta Workshop betreedt the Wild Frontier en brengt Bangalore met zich mee - Mini Epics style!</p><p>Verzamel alle Apex Legends Mini Epics en voeg ook Bloodhound en Mirage toe aan je collectie!</p></p></div>",
"brand": {
        "@type": "Thing",
        "name": "Game Mania"    
},
"aggregateRating": {        
        "@type": "AggregateRating",
        "ratingValue": "5",
        "ratingCount": "2"    
},
"offers": {        
        "@type": "Offer",
        "priceCurrency": "EUR",
        "price": "19.98",        
        "availability" : "InStock"    
   }
}

Data im trying to scrape and save: enter image description here


Solution

  • As Bergi pointed out, the problem is that you're using JSON.stringify on the content which is already a string, but out of curiosity I tried this myself. Consider the following test:

    index.html (that is served through localhost:4000):

    <html>
    <script type="application/ld+json">
        {
            "@context": "http://schema.org",
            "@type": "Product",
            "name": "Apex Legends - Bangalore - Mini Epics",
            "offers": {
                "@type": "Offer",
                "priceCurrency": "EUR",
                "price": "19.98",
                "availability": "InStock"
            }
        }
    </script>
    <body>
    <h2>Index</h2>
    </body>
    </html>
    

    NodeJS-script:

    const superagent = require('superagent');
    const cheerio = require('cheerio');
    
    (async () => {
        const response = await superagent("http://localhost:4000");
    
        const $ = cheerio.load(response.text);
        // note that I'm not using .html(), although it works for me either way
        const jsonRaw = $("script[type='application/ld+json']")[0].children[0].data; 
        // do not use JSON.stringify on the jsonRaw content, as it's already a string
        const result = JSON.parse(jsonRaw);
        console.log(result.offers.availability);
    })()
    

    result now is an object that holds the data from the script tag and logging result.offers.availability, will print InStock as expected.