Search code examples
node.jssquarespace

Node.js http.get against squarespace.com site has status code 403


When I do a simple http.get for a URL that goes to a SquareSpace (SS) site I'm getting a 403 message. I know the site is working and that the server can reach it. Here's a simple example against a SS site (not mine, but produces the same issue):

  • Show that server can access site: curl http://www.letsmoveschools.org This returns all the HTML from the site...

  • Node app

    var http = require('http');
    var url;
    
    url = 'http://www.letsmoveschools.org/';
    
    var req = http.get(url, function(res) {
    
      res.on('data', function(chunk) {
       //Handle chunk data
      });
    
      res.on('end', function() {
        // parse xml
        console.log(res.statusCode);
      });
    
      // or you can pipe the data to a parser
      //res.pipe(dest);
    
    });
    
    req.on('error', function(err) {
      // debug error
      console.log('error');
    });
    

When I run the node app now node app.js it outputs the 403 status code.

I have tried this code with other sites and it works fine, just not against squarespace sites. Any idea of either configuration on SS or something else I need to do in Node?


Solution

  • The problem is that the remote server is expecting/requiring a User-Agent header and node does not send such headers automatically. Add that and you should get back a 200 response:

    // ...
    
    url = 'http://www.letsmoveschools.org/';
    
    var opts = require('url').parse(url);
    opts.headers = {
      'User-Agent': 'javascript'
    };
    
    var req = http.get(opts, function(res) {
    // ...