Search code examples
node.jsexpressquery-stringquerystringparameter

Does Express.js respect RFC-3986 for query string?


Does ExpressJs respect/use the RFC-3986 standard when decoding query string parameters? Why the direct char "è" is accepted but the encoded version "%E8" isn't?

Test Expressjs http server

'use strict';

const express = require('express');
const bodyParser = require('body-parser');

// parse application/x-www-form-urlencoded
app.use(bodyParser.urlencoded({ extended: false }));

app.get('/test?', (req, res, next) => {
  console.log(req.query);
  res.status(200);
});

app.listen(4567, '127.0.0.1', () => {
    console.log('test http server started');
});

Request

GET localhost:4567/test?message=lorem+ipsum%2C%20foo+%E8+bar

Expected log

{ message: 'lorem ipsum, foo è bar' }

Server logs

{ message: 'lorem+ipsum%2C%20foo+%E8+bar' }

If we remove the %E8 char "è"

Request

GET localhost:4567/test?message=lorem+ipsum%2C%20foo+bar

Server logs

{ message: 'lorem ipsum, foo bar' }

Here (https://www.url-encode-decode.com/) I can read that for URI it can be applied the RFC-3986 which doesn't allow chars like è, é, à...

So it seems that Express refuse those chars, but if we try

Request

GET localhost:4567/test?message=lorem+ipsum%2C%20foo+è+bar

Expected log

{ message: 'lorem+ipsum%2C%20foo+è+bar' }

Server logs

{ message: 'lorem ipsum, foo è bar' }

So the direct char "è" is accepted but the encoded version %E8 isn't?

I've tried to read ExpressJS sources but I can't find out a response.


Solution

  • Basically self solved:

    First thing first is that i found that in UTF-8 the hex of 'è' is 'C3A8' not 'E8'.

    So Express is probably accepting all UTF-8 chars, without applying RFC-3986 standard. This will explain why 'E8' isn't accepted but direct char 'è' is. 'E8' isn't accepted beceause it doesn't match anything in UTF-8.