Does ExpressJs respect/use the RFC-3986 standard when decoding query string parameters? Why the direct char "è" is accepted but the encoded version "%E8" isn't?
Test Expressjs http server
'use strict';
const express = require('express');
const bodyParser = require('body-parser');
// parse application/x-www-form-urlencoded
app.use(bodyParser.urlencoded({ extended: false }));
app.get('/test?', (req, res, next) => {
app.listen(4567, '', () => {
console.log('test http server started');
GET localhost:4567/test?message=lorem+ipsum%2C%20foo+%E8+bar
Expected log
{ message: 'lorem ipsum, foo è bar' }
Server logs
{ message: 'lorem+ipsum%2C%20foo+%E8+bar' }
If we remove the %E8 char "è"
GET localhost:4567/test?message=lorem+ipsum%2C%20foo+bar
Server logs
{ message: 'lorem ipsum, foo bar' }
Here ( I can read that for URI it can be applied the RFC-3986 which doesn't allow chars like è, é, à...
So it seems that Express refuse those chars, but if we try
GET localhost:4567/test?message=lorem+ipsum%2C%20foo+è+bar
Expected log
{ message: 'lorem+ipsum%2C%20foo+è+bar' }
Server logs
{ message: 'lorem ipsum, foo è bar' }
So the direct char "è" is accepted but the encoded version %E8 isn't?
I've tried to read ExpressJS sources but I can't find out a response.
Basically self solved:
First thing first is that i found that in UTF-8 the hex of 'è' is 'C3A8' not 'E8'.
So Express is probably accepting all UTF-8 chars, without applying RFC-3986 standard. This will explain why 'E8' isn't accepted but direct char 'è' is. 'E8' isn't accepted beceause it doesn't match anything in UTF-8.