Search code examples
javascriptnode.jshttpnetwork-programminghttp-request

Node.js request module getting modern version of website


Often when making a GET request with the request module in Node.js, the oldest version of the website's HTML is returned.

For example, a very old version of Google is returned when making a request to http://google.com. On the other hand, accessing Google in a browser returns a much more modern version of the website.

I suspect that it related to the device/browser information accessed by sites like Google. request doesn't send any device information (from what I know).

Is there any way to trick sites into thinking that the are being accessed by an actual device/browser (and a modern one too)?


Solution

  • By default, the request package does not include any device information (As the question mentions). Big sites like google use this information to suit aspects of the page like HTML version, CSS/JS features. Newer user-agent means the page can use more and newer features. To emulate any specific device (To debug a mobile page, for instance), pick the appropriate user-agent at useragentstring.com.

    Some other headers like accept and accept-encoding can also affect this (Doc here).

    Try this code (taken from the docs):

    var request = require('request');
    
    var options = {
      url: 'https://google.com',
      headers: {
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/52.0.2743.116 Safari/537.36'
      }
    };
    
    function callback(error, response, body)
    {
      console.log(body);
    }
    
    request(options, callback);