Search code examples
httpsslhttpsnode-fetch

'TypeError: Protocol "http:" not supported. Expected "https:"' error when fetching HTTPS site


I'm trying to use node-fetch to capture the contents of a page, and running into an unexpected error. I checked a similar question but it doesn't seem relevant. I am trying to fetch a HTTPS site using an HTTPS agent and agents, but I'm getting an unexpected error about HTTP. I wonder whether this may be due to redirects, but I can't see anything that would cause it. This only fails for this particular URL (works fine, for example, with https://www.robinhood.com) , and I'm trying to figure out why. Here is a minimal example. I'd note that this uses some certificates I have saved locally, but I'm not sure how necessary that is to reproduce.

//start SO example
var siteURL = "https://robinhood.com/l/privacy";
import path from 'path';
import sslrootcas from 'ssl-root-cas';
const rootCas = sslrootcas.create();
import {fileURLToPath} from 'url';
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
rootCas.addFile(path.resolve(__dirname,'intermediate.pem'));
import http from 'node:http';
import https from 'node:https';
import UserAgent       from 'user-agents';  
const myhttpsAgent = new https.Agent({ca: rootCas});
// const requestcheck =  fetch("https://www.google.com", {
const requestcheck =  fetch(siteURL, {
   method: "GET"
  ,headers: {"User-Agent":  new UserAgent() }
  ,agent: myhttpsAgent
  })

Here is the error I'm getting:

node:internal/errors:477
    ErrorCaptureStackTrace(err);
    ^

TypeError: Protocol "http:" not supported. Expected "https:"
    at new NodeError (node:internal/errors:387:5)
    at new ClientRequest (node:_http_client:177:11)
    at request (node:http:96:10)
    at file:///home/app/node_modules/node-fetch/src/index.js:94:20
    at new Promise (<anonymous>)
    at fetch (file:///home/app/node_modules/node-fetch/src/index.js:49:9)
    at ClientRequest.<anonymous> (file:///home/app/node_modules/node-fetch/src/index.js:236:15)
    at ClientRequest.emit (node:events:525:35)
    at HTTPParser.parserOnIncomingClient [as onIncoming] (node:_http_client:674:27)
    at HTTPParser.parserOnHeadersComplete (node:_http_common:128:17)
    at TLSSocket.socketOnData (node:_http_client:521:22)
    at TLSSocket.emit (node:events:525:35)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)
    at TLSSocket.Readable.push (node:internal/streams/readable:228:10)
    at TLSWrap.onStreamRead (node:internal/stream_base_commons:190:23) {
  code: 'ERR_INVALID_PROTOCOL'
}

Solution

  • I wonder whether this may be due to redirects, but I can't see anything that would cause it.

    https://robinhood.com/l/privacy redirects to
    https://robinhood.com/us/en/support/articles/privacy-policy which then redirects to
    http://robinhood.com/us/en/support/articles/privacy-policy/

    The latter URL is plain HTTP and thus the wrong protocol by a https-only user agent.