Search code examples
javascriptnode.jshttpgoogle-books

Image download issue with NodeJS


I am trying to download Google Books pages using axios, with the following code :

const response = await axios({
        method: 'GET',
        url: url,
        responseType: 'stream'
    })

This works for some images (e.g. this one) but fails for some others (like this one). Instead of serving the actual image, Google serves a default "Image not available" file.

Both requests do work in the browser, but the second one fails in NodeJS.

I compared the request and response headers for both requests and could not see any relevant information ; notice that both images are PNGs. I don't recall facing this issue with JPEGs so far.

Why is Google not serving the second image properly ?

Feel free to try at home with the following code :

const axios = require('axios');
const fs = require('fs');

(async function () {
    const response = await axios({
        method: 'GET',
        url: 'https://books.google.fr/books/content?id=DvGApMzEJmQC&hl=fr&pg=PA61&img=1&zoom=3&sig=ACfU3U3IPtY0MOIxgMR8rJTxt9YYGPUl1Q&w=1025',
        responseType: 'stream'
    })
    response.data.pipe(fs.createWriteStream('result.png'))

    return new Promise((resolve, reject) => {
        response.data.on('end', () => {
            resolve();
        })
        response.data.on('error', () => {
            reject();
        })
    })
})();

Solution

  • I finally found an explanation, although it does not solve the whole mystery.

    Google does not serve the second file unless the NID cookie is provided with the request. Per Google's policies :

    The NID cookie contains a unique ID Google uses to remember your preferences and other information, such as your preferred language (e.g. English), how many search results you wish to have shown per page (e.g. 10 or 20), and whether or not you wish to have Google’s SafeSearch filter turned on.

    Now I am wondering two things :

    • Why is this even required, as a browsing customization cookie ? Is it for safesearch ?
    • Why is it only required for a few files ?

    Here's a solution anyhow to my issue :

    const initialRequest = await axios({
            method: 'GET',
            url: 'https://google.com'
        })
    
        const response = await axios({
            method: 'GET',
            url: 'https://books.google.com/books/content?id=DvGApMzEJmQC&pg=PA61&img=1&zoom=3&sig=ACfU3U3IPtY0MOIxgMR8rJTxt9YYGPUl1Q&w=1025',
            responseType: 'stream',
            headers:{
                'Cookie' : initialRequest.headers['set-cookie']
            }
        })