Search code examples
htmlnode.jshtml-parsing

Get title, image and description of a web page from Node JS


I want to extract a website's title, image and description from Node JS.

Website: https://www.usnews.com/education/best-colleges/articles/college-applications-are-on-the-rise-what-to-know

I was using link-preview-js library, but it is not extracting data for this specific link. What do I do?


Solution

  • Here's a list of user-agents you can switch to using the link-preview-js library: https://whatmyuseragent.com/engines

    Solution:

    const { getLinkPreview } = require('link-preview-js');
    
    const linkResult = await getLinkPreview(link, {
        timeout: 10000,
        followRedirects: "manual",
        handleRedirects: (baseURL, forwardedURL) => {
            const base = new URL(baseURL).hostname
            const forwarded = new URL(forwardedURL).hostname
            return (forwarded === base || forwarded === "www." + base)
        },
        headers: {
            "user-agent": "Mozilla/5.0 (Linux; Android 11; vivo 1904; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/87.0.4280.141 Mobile Safari/537.36 VivoBrowser/8.7.0.1"
        }
    })