I am trying to secure, as best as possible, a comment form in a non-CMS environment with no user authentication.
The form should be secure against both browser and curl/postman type requests.
Environment
Backend - Node.js, MongoDB Atlas and Azure web app.
Frontend - jQuery.
Below is a detailed, but hopefully not too overwhelming, overview of my current working implementation.
Following that are my questions about the implementation.
Related Libraries Used
Helmet - helps secure Express apps by setting various HTTP headers, including Content Security Policy
reCaptcha v3 - protects against spam and other types of automated abuse
DOMPurify - an XSS sanitizer
validator.js - a library of string validators and sanitizers
he - an HTML entity encoder/decoder
The general flow of data is:
/*
on click event:
- get sanitized data
- perform some validations
- html encode the values
- get recaptcha v3 token from google
- send all data, including token, to server
- send token to google to verify
- if the response 'score' is above 0.5, add the submission to the database
- return the entry to the client and populate the DOM with the submission
*/
POST request - browser
// test input:
// <script>alert("hi!")</script><h1>hello there!</h1> <a href="">link</a>
// sanitize the input
var sanitized_input_1_text = DOMPurify.sanitize($input_1.val().trim(), { SAFE_FOR_JQUERY: true });
var sanitized_input_2_text = DOMPurify.sanitize($input_2.val().trim(), { SAFE_FOR_JQUERY: true });
// validation - make sure input is between 1 and 140 characters
var input_1_text_valid_length = validator.isLength(sanitized_input_1_text, { min: 1, max: 140 });
var input_2_text_valid_length = validator.isLength(sanitized_input_2_text, { min: 1, max: 140 });
// if validations pass
if (input_1_text_valid_length === true && input_2_text_valid_length === true) {
/*
encode the sanitized input
not sure if i should encode BEFORE adding to MongoDB
or just add to database "as is" and encode BEFORE displaying in the DOM with $("#ouput").html(html_content);
*/
var sanitized_encoded_input_1_text = he.encode(input_1_text);
var sanitized_encoded_input_2_text = he.encode(input_2_text);
// define parameters to send to database
var parameters = {};
parameters.input_1_text = sanitized_encoded_input_1_text;
parameters.input_2_text = sanitized_encoded_input_2_text;
// get token from google and send token and input to database
// see: https://developers.google.com/recaptcha/docs/v3#programmatically_invoke_the_challenge
grecaptcha.ready(function() {
grecaptcha.execute('site-key-here', { action: 'submit' }).then(function(token) {
parameters.token = token;
jquery_ajax_call_to_my_api(parameters);
});
});
}
POST request - server
var secret_key = process.env.RECAPTCHA_SECRET_SITE_KEY;
var token = req.body.token;
var url = `https://www.google.com/recaptcha/api/siteverify?secret=${secret_key}&response=${token}`;
// verify recaptcha token with google
var response = await fetch(url);
var response_json = await response.json();
var score = response_json.score;
var document = {};
/*
if google's response 'score' is greater than 0.5,
add submission to the database and populate client DOM with $("#output").prepend(html);
see: https://developers.google.com/recaptcha/docs/v3#interpreting_the_score
*/
if (score >= 0.5) {
// add submission to database
// return submisson to client to update the DOM
// DOM will just display this text: <h1>hello there!</h1> <a href="">link</a>
});
GET request on page load
Logic/Assumptions:
$("#output").html(html_content);
.POST request from curl, postman etc
Logic/Assumptions:
Helmet configuration on server
app.use(
helmet({
contentSecurityPolicy: {
directives: {
defaultSrc: ["'self'"],
scriptSrc: ["'self'", "https://somedomain.io", "https://maps.googleapis.com", "https://www.google.com", "https://www.gstatic.com"],
styleSrc: ["'self'", "fonts.googleapis.com", "'unsafe-inline'"],
fontSrc: ["'self'", "fonts.gstatic.com"],
imgSrc: ["'self'", "https://maps.gstatic.com", "https://maps.googleapis.com", "data:"],
frameSrc: ["'self'", "https://www.google.com"]
}
},
})
);
Questions
Should I add values to the MongoDB database as HTML encoded entities OR store them "as is" and just encode them before populating the DOM with them?
If the values were to be saved as html entities in MongoDB, would this make searching the database for content difficult because searching for, for example "<h1>hello there!</h1> <a href="">link</a>
wouldn't return any results because the value in the database was <h1>hello there!</h1> <a href="">link</a>
In my reading about securing web forms, much has been said about client side practises being fairly redundant as anything can be changed in the DOM, JavaScript can be disabled, and requests can be made directly to the API endpoint using curl or postman and therefore bypass any client side approaches.
With that said should sanitization (DOMPurify), validation (validator.js) and encoding (he) be performed either: 1) client side only 2) client side and server side or 3) server side only?
For thoroughness, here is another related question:
Do any of the following components do any automatic escaping or HTML encoding when sending data from client to server? I ask because if they do, it may make some manual escaping or encoding unnecessary.
After reading more around the topic, this is the approach I came up with:
On click event:
POST
requests on a certain route to X
per X
milliseconds (by IP address)app.set('trust proxy', true)
in node server file in order for rate limiter to pick up the user's actual IP address - see Express behind proxies)0.5
, perform the same santization and validations againmoderated
flag value of false
Rather than immediately return entries to the browser, I decided instead to require a process of manual moderation which involves changing the moderated
value of an entry to true
. Whilst it takes away the immediacy of the response for the user, it makes it less tempting for spammers etc if responses aren't immediately published.
GET
request on page load then returns all entries that are moderated: true
The code looked something like this:
POST request - browser
// sanitize the input
var sanitized_input_1_text = DOMPurify.sanitize($input_1.val().trim(), { SAFE_FOR_JQUERY: true });
var sanitized_input_2_text = DOMPurify.sanitize($input_2.val().trim(), { SAFE_FOR_JQUERY: true });
// validation - make sure input is between 1 and 140 characters
var input_1_text_valid_length = validator.isLength(sanitized_input_1_text, { min: 1, max: 140 });
var input_2_text_valid_length = validator.isLength(sanitized_input_2_text, { min: 1, max: 140 });
// validation - regex to only allow certain characters
// for pattern, see: https://stackoverflow.com/q/63895992
var pattern = /^(?!.*([ ,'-])\1)[a-zA-Z]+(?:[ ,'-]+[a-zA-Z]+)*$/;
var input_1_text_valid_characters = validator.matches(sanitized_input_1_text, pattern, "gm");
var input_2_text_valid_characters = validator.matches(sanitized_input_2_text, pattern, "gm");
// if validations pass
if (input_1_text_valid_length === true && input_2_text_valid_length === true && input_1_text_valid_characters === true && input_2_text_valid_characters === true) {
// define parameters to send to database
var parameters = {};
parameters.input_1_text = sanitized_input_1_text;
parameters.input_2_text = sanitized_input_2_text;
// get token from google and send token and input to database
// see: https://developers.google.com/recaptcha/docs/v3#programmatically_invoke_the_challenge
grecaptcha.ready(function() {
grecaptcha.execute('site-key-here', { action: 'submit_entry' }).then(function(token) {
parameters.token = token;
jquery_ajax_call_to_my_api(parameters);
});
});
}
POST request - server
var secret_key = process.env.RECAPTCHA_SECRET_SITE_KEY;
var token = req.body.token;
var url = `https://www.google.com/recaptcha/api/siteverify?secret=${secret_key}&response=${token}`;
// verify recaptcha token with google
var response = await fetch(url);
var response_json = await response.json();
var score = response_json.score;
var document = {};
// if google's response 'score' is greater than 0.5,
// see: https://developers.google.com/recaptcha/docs/v3#interpreting_the_score
if (score >= 0.5) {
// perform all the same sanitizations and validations to protect against
// POST requests direct to the API via curl or postman etc
// if validations pass, add entry to the database with `moderated: false` property
});
GET request - browser
Logic:
moderated: true
propertyHelmet configuration on server
app.use(
helmet({
contentSecurityPolicy: {
directives: {
defaultSrc: ["'self'"],
scriptSrc: ["'self'", "https://maps.googleapis.com", "https://www.google.com", "https://www.gstatic.com"],
connectSrc: ["'self'", "https://some-domain.com", "https://some.other.domain.com"],
styleSrc: ["'self'", "fonts.googleapis.com", "'unsafe-inline'"],
fontSrc: ["'self'", "fonts.gstatic.com"],
imgSrc: ["'self'", "https://maps.gstatic.com", "https://maps.googleapis.com", "data:", "https://another-domain.com"],
frameSrc: ["'self'", "https://www.google.com"]
}
},
})
);
In answer to my questions in the OP:
- Should I add values to the MongoDB database as HTML encoded entities OR store them "as is" and just encode them before populating the DOM with them?
As long as the input is sanitised and validated on both client and server, you should only need to HTML encode just before populating the DOM.
- If the values were to be saved as html entities in MongoDB, would this make searching the database for content difficult because searching for, for example
<h1>hello there!</h1> <a href="">link</a>
wouldn't return any results because the value in the database was<h1>hello there!</h1> <a href="">link</a>
I figured it would make database entries look messy if they were filled with HTML encoded values, so I store the sanitized, validated entries "as is".
In my reading about securing web forms, much has been said about client side practises being fairly redundant as anything can be changed in the DOM, JavaScript can be disabled, and requests can be made directly to the API endpoint using curl or postman and therefore bypass any client side approaches.
With that said should sanitization (DOMPurify), validation (validator.js) and encoding (he) be performed either: 1) client side only 2) client side and server side or 3) server side only?
Option 2
, sanitize and validate input on client and server.