Search code examples
jquerynode.jsmongodbhtml-entitiessanitization

When securing a comment form and related API endpoint, should input be sanitized, validated and encoded in browser, server or both?


I am trying to secure, as best as possible, a comment form in a non-CMS environment with no user authentication.

The form should be secure against both browser and curl/postman type requests.

Environment

Backend - Node.js, MongoDB Atlas and Azure web app.
Frontend - jQuery.

Below is a detailed, but hopefully not too overwhelming, overview of my current working implementation.

Following that are my questions about the implementation.

Related Libraries Used

Helmet - helps secure Express apps by setting various HTTP headers, including Content Security Policy
reCaptcha v3 - protects against spam and other types of automated abuse
DOMPurify - an XSS sanitizer
validator.js - a library of string validators and sanitizers
he - an HTML entity encoder/decoder

The general flow of data is:

/*
on click event:  
- get sanitized data
- perform some validations
- html encode the values
- get recaptcha v3 token from google
- send all data, including token, to server
- send token to google to verify
- if the response 'score' is above 0.5, add the submission to the database  
- return the entry to the client and populate the DOM with the submission   
*/ 

POST request - browser

// test input:  
// <script>alert("hi!")</script><h1>hello there!</h1> <a href="">link</a>

// sanitize the input  
var sanitized_input_1_text = DOMPurify.sanitize($input_1.val().trim(), { SAFE_FOR_JQUERY: true });
var sanitized_input_2_text = DOMPurify.sanitize($input_2.val().trim(), { SAFE_FOR_JQUERY: true });

// validation - make sure input is between 1 and 140 characters
var input_1_text_valid_length = validator.isLength(sanitized_input_1_text, { min: 1, max: 140 });
var input_2_text_valid_length = validator.isLength(sanitized_input_2_text, { min: 1, max: 140 });

// if validations pass
if (input_1_text_valid_length === true && input_2_text_valid_length === true) {

/* 
encode the sanitized input 
not sure if i should encode BEFORE adding to MongoDB  
or just add to database "as is" and encode BEFORE displaying in the DOM with $("#ouput").html(html_content);
*/  
var sanitized_encoded_input_1_text = he.encode(input_1_text);
var sanitized_encoded_input_2_text = he.encode(input_2_text);

// define parameters to send to database  
var parameters = {};
parameters.input_1_text = sanitized_encoded_input_1_text; 
parameters.input_2_text = sanitized_encoded_input_2_text; 

// get token from google and send token and input to database
// see:  https://developers.google.com/recaptcha/docs/v3#programmatically_invoke_the_challenge
grecaptcha.ready(function() {
    grecaptcha.execute('site-key-here', { action: 'submit' }).then(function(token) {
        parameters.token = token;
        jquery_ajax_call_to_my_api(parameters);
    });
});
}

POST request - server

var secret_key = process.env.RECAPTCHA_SECRET_SITE_KEY;
var token = req.body.token;
var url = `https://www.google.com/recaptcha/api/siteverify?secret=${secret_key}&response=${token}`;

// verify recaptcha token with google
var response = await fetch(url);
var response_json = await response.json();
var score = response_json.score;
var document = {};

/*
if google's response 'score' is greater than 0.5, 
add submission to the database and populate client DOM with $("#output").prepend(html); 
see: https://developers.google.com/recaptcha/docs/v3#interpreting_the_score
*/
if (score >= 0.5) {

    // add submission to database 
    // return submisson to client to update the DOM
    // DOM will just display this text:  <h1>hello there!</h1> <a href="">link</a>
}); 

GET request on page load

Logic/Assumptions:

  • Get all submissions, return to client and add to DOM with $("#output").html(html_content);.
  • Don't need to encode values before populating DOM because values are already encoded in database?

POST request from curl, postman etc

Logic/Assumptions:

  • They don't have google token, and therefore can't verify it from server, and can't add entries to the database?

Helmet configuration on server

app.use(
    helmet({
        contentSecurityPolicy: {
            directives: {
                defaultSrc: ["'self'"],
                scriptSrc: ["'self'", "https://somedomain.io", "https://maps.googleapis.com", "https://www.google.com", "https://www.gstatic.com"],
                styleSrc: ["'self'", "fonts.googleapis.com", "'unsafe-inline'"],
                fontSrc: ["'self'", "fonts.gstatic.com"],
                imgSrc: ["'self'", "https://maps.gstatic.com", "https://maps.googleapis.com", "data:"],
                frameSrc: ["'self'", "https://www.google.com"]
            }
        },
    })
);

Questions

  1. Should I add values to the MongoDB database as HTML encoded entities OR store them "as is" and just encode them before populating the DOM with them?

  2. If the values were to be saved as html entities in MongoDB, would this make searching the database for content difficult because searching for, for example "<h1>hello there!</h1> <a href="">link</a> wouldn't return any results because the value in the database was &#x3C;h1&#x3E;hello there!&#x3C;/h1&#x3E; &#x3C;a href=&#x22;&#x22;&#x3E;link&#x3C;/a&#x3E;

  3. In my reading about securing web forms, much has been said about client side practises being fairly redundant as anything can be changed in the DOM, JavaScript can be disabled, and requests can be made directly to the API endpoint using curl or postman and therefore bypass any client side approaches.

  4. With that said should sanitization (DOMPurify), validation (validator.js) and encoding (he) be performed either: 1) client side only 2) client side and server side or 3) server side only?

For thoroughness, here is another related question:

Do any of the following components do any automatic escaping or HTML encoding when sending data from client to server? I ask because if they do, it may make some manual escaping or encoding unnecessary.

  • jQuery ajax() requests
  • Node.js
  • Express
  • Helmet
  • bodyParser (node package)
  • MongoDB native driver
  • MongoDB

Solution

  • After reading more around the topic, this is the approach I came up with:

    On click event:

    • Sanitize data (DOMPurify)
    • Validate data (validator.js)
    • Get recaptcha v3 token from google (reCaptcha v3)
    • Send all data, including token, to server
    • Server is using Helmet
    • Server is using Express Rate Limit and Rate Limit Mongo to limit POST requests on a certain route to X per X milliseconds (by IP address)
    • Server is behind Cloudflare proxy which provides some security and caching features (requires setting app.set('trust proxy', true) in node server file in order for rate limiter to pick up the user's actual IP address - see Express behind proxies)
    • Send token to google from server to verify (reCaptcha v3)
    • If the response 'score' is above 0.5, perform the same santization and validations again
    • If the validations pass, add entry to database with a moderated flag value of false

    Rather than immediately return entries to the browser, I decided instead to require a process of manual moderation which involves changing the moderated value of an entry to true. Whilst it takes away the immediacy of the response for the user, it makes it less tempting for spammers etc if responses aren't immediately published.

    • The GET request on page load then returns all entries that are moderated: true
    • HTML encode the values before displaying them (he)
    • Populate the DOM with the HTML encoded entries

    The code looked something like this:

    POST request - browser

    // sanitize the input  
    var sanitized_input_1_text = DOMPurify.sanitize($input_1.val().trim(), { SAFE_FOR_JQUERY: true });
    var sanitized_input_2_text = DOMPurify.sanitize($input_2.val().trim(), { SAFE_FOR_JQUERY: true });
    
    // validation - make sure input is between 1 and 140 characters
    var input_1_text_valid_length = validator.isLength(sanitized_input_1_text, { min: 1, max: 140 });
    var input_2_text_valid_length = validator.isLength(sanitized_input_2_text, { min: 1, max: 140 });
    
    // validation - regex to only allow certain characters
    // for pattern, see:  https://stackoverflow.com/q/63895992
    var pattern = /^(?!.*([ ,'-])\1)[a-zA-Z]+(?:[ ,'-]+[a-zA-Z]+)*$/;
    var input_1_text_valid_characters = validator.matches(sanitized_input_1_text, pattern, "gm");
    var input_2_text_valid_characters = validator.matches(sanitized_input_2_text, pattern, "gm");
    
    // if validations pass
    if (input_1_text_valid_length === true && input_2_text_valid_length === true && input_1_text_valid_characters === true && input_2_text_valid_characters === true) {
    
    // define parameters to send to database  
    var parameters = {};
    parameters.input_1_text = sanitized_input_1_text; 
    parameters.input_2_text = sanitized_input_2_text; 
    
    // get token from google and send token and input to database
    // see:  https://developers.google.com/recaptcha/docs/v3#programmatically_invoke_the_challenge
    grecaptcha.ready(function() {
        grecaptcha.execute('site-key-here', { action: 'submit_entry' }).then(function(token) {
            parameters.token = token;
            jquery_ajax_call_to_my_api(parameters);
        });
    });
    }
    

    POST request - server

    var secret_key = process.env.RECAPTCHA_SECRET_SITE_KEY;
    var token = req.body.token;
    var url = `https://www.google.com/recaptcha/api/siteverify?secret=${secret_key}&response=${token}`;
    
    // verify recaptcha token with google
    var response = await fetch(url);
    var response_json = await response.json();
    var score = response_json.score;
    var document = {};
    
    // if google's response 'score' is greater than 0.5, 
    // see: https://developers.google.com/recaptcha/docs/v3#interpreting_the_score  
    
    if (score >= 0.5) {
    
    // perform all the same sanitizations and validations to protect against
    // POST requests direct to the API via curl or postman etc  
    // if validations pass, add entry to the database with `moderated: false` property   
    
    
    }); 
    

    GET request - browser

    Logic:

    • Get all entries with moderated: true property
    • HTML encode values before populating DOM

    Helmet configuration on server

    app.use(
        helmet({
            contentSecurityPolicy: {
                directives: {
                    defaultSrc: ["'self'"],
                    scriptSrc: ["'self'", "https://maps.googleapis.com", "https://www.google.com", "https://www.gstatic.com"],
                    connectSrc: ["'self'", "https://some-domain.com", "https://some.other.domain.com"],
                    styleSrc: ["'self'", "fonts.googleapis.com", "'unsafe-inline'"],
                    fontSrc: ["'self'", "fonts.gstatic.com"],
                    imgSrc: ["'self'", "https://maps.gstatic.com", "https://maps.googleapis.com", "data:", "https://another-domain.com"],
                    frameSrc: ["'self'", "https://www.google.com"]
                }
            },
        })
    );
    

    In answer to my questions in the OP:

    1. Should I add values to the MongoDB database as HTML encoded entities OR store them "as is" and just encode them before populating the DOM with them?

    As long as the input is sanitised and validated on both client and server, you should only need to HTML encode just before populating the DOM.

    1. If the values were to be saved as html entities in MongoDB, would this make searching the database for content difficult because searching for, for example <h1>hello there!</h1> <a href="">link</a> wouldn't return any results because the value in the database was &#x3C;h1&#x3E;hello there!&#x3C;/h1&#x3E; &#x3C;a href=&#x22;&#x22;&#x3E;link&#x3C;/a&#x3E;

    I figured it would make database entries look messy if they were filled with HTML encoded values, so I store the sanitized, validated entries "as is".

    1. In my reading about securing web forms, much has been said about client side practises being fairly redundant as anything can be changed in the DOM, JavaScript can be disabled, and requests can be made directly to the API endpoint using curl or postman and therefore bypass any client side approaches.

    2. With that said should sanitization (DOMPurify), validation (validator.js) and encoding (he) be performed either: 1) client side only 2) client side and server side or 3) server side only?

    Option 2, sanitize and validate input on client and server.