Search code examples
htmlcsssanitization

How to safely assign untrusted input to CSS custom properties without JavaScript?


Let's say I have an object of string keys and string values, and I'd like to write these as CSS custom properties to some HTML generated by the server. How could I do so safely?

By safely I mean that

  • if at all possible, the custom property declaration should not cause a CSS syntax error that prevents the browser from correctly parsing other style declarations or parts of the HTML document. If this is for some reason not possible, then the key-value pair should be omitted.
  • Even more strongly, there should be no possibility of cross-site scripting with this.

For simplicity's sake, I'm going to limit keys to only allow characters within the class [a-zA-Z0-9_-].

From reading the CSS specification, and some personal testing, I think I can get quite far by taking the value through the following steps:

  • Look for strings
  • Ensure that every quote is followed somewhere by another (not escaped) quote of the same type (" or '). If that's not the case, discard this key/value pair.
  • Ensure that every opening brace {([ outside of a string has a matching closing brace. If not, discard this key-value pair.
  • Escape all instances of < with \3C, and all instances of > with 3E.
  • Escape all instances of ; with \3B.

I came up with the steps above based on this CSS syntax specification

For context, these properties may be used by a user's custom styling which we insert elsewhere, but the same object is also used as template data in a template, so it may contain a mix of strings intended as content, and ones intended as CSS variables. I feel like the algorithm above strikes a nice balance of being quite simple, but also not risking to discard too many key-value pairs that could potentially be useful in CSS (even considering future additions to CSS, but I'd like to make sure I'm not missing something.


Here's some JS code showing what I'm trying to achieve. obj is the object in question, and preprocessPairs is a function that takes the object, and preprocesses it, dropping/reformatting the values as described in the steps above.

function generateThemePropertiesTag(obj) {
  obj = preprocessPairs(obj);
  return `<style>
:root {
${Object.entries(obj).map(([key, value]) => {
  return `--theme-${key}: ${value};`
}).join("\n")}
}
</style>`
}

So when given an object like this

{
  "color": "#D3A",
  "title": "The quick brown fox"
}

I'd expect the CSS to look like this:

:root {
--theme-color: #D3A;
--theme-title: The quick brown fox;
}

And while --theme-title is a pretty useless custom variable in terms of use in CSS, it doesn't actually break the stylesheet, because CSS ignores properties it doesn't understand.


Solution

  • We may actually use just regular expressions and some other algorithms without having to rely on one specific language, hope it is what you need here.

    By stating that the object keys are within [a-zA-Z0-9_-] leaves us with the need to somehow parse values.

    Value patterns

    So we can break this into categories and just see what we can encounter (they might be slightly simplified for clarity):

    1. '.*' (string surrounded by apostrophes; greedy)
    2. ".*" (string surrounded by double quotes; greedy)
    3. [+-]?\d+(\.\d+)?(%|[A-z]+)? (integers and decimal numbers, optionally per cent or with a unit)
    4. #[0-9A-f]{3,6} (colours)
    5. [A-z0-9_-]+ (keywords, named colours, things like "ease-in")
    6. ([\w-]+)\([^)]+\) (functions like url(), calc() etc.)

    First filtering

    I can imagine there is some filtering you can do before trying to recognize these patterns. Maybe we trim the value string first. As you mention, < and > can be escaped at the beginning of the preprocessPairs() function, because it does not appear as a part of any of the patterns we have above. If you don't expect unescaped semicolons anywhere, you may escape them as well.

    Recognizing patterns

    Then we can try recognizing these patterns in the value and for each of the pattern, we might need to run filtering again. We expect that these patterns will be separated by some whitespace character (or two).

    Including a support for multi-line strings should be OK, that's an escaped newline character.

    Language contexts

    We need to recognize that we are filtering for two contexts at least - HTML and CSS. As we include the styles in the <style> element, the input must be safe for that, and at the same time it must be a valid CSS. Luckily, you do not include the CSS in the style attribute of an element, so that makes it slightly easier.

    Filtering based on value pattern

    1. Strings surrounded by apostrophes - we do not care about anything but apostrophes and semicolons, so we need to find unescaped instances of these characters in the string and escape them
    2. Same as above, just with double quotes
    3. Should be OK
    4. Should be OK
    5. Pretty much OK
    6. This is the fun part

    So points 1-5 will be quite easy and with the previous simple filtering and trimming will cover most of the values. With some additions (no idea what is the effect on performance) it might even do additional checking of correct units, keywords etc.

    But I see a relatively bigger challenge compared to other points is the point #6. You may decide to simply forbid url() in this custom styling, leaving you with checking the input for functions, so for example you might want to escape semicolons and perhaps even check again patterns inside functions with tiny adjustments e.g. for calc().

    Conclusion

    Broadly that's from my point of view. With a bit of adjustment of these regexes, it should be able to complement what you already did and give as much flexibility to the input CSS while keeping you away from having to adjust the code with every adjustment to CSS features.

    Example

    function preprocessPairs(obj) {
      // Catch-all regular expression
      // Explanation:
      // (                                   Start of alternatives
      //   \w+\(.+?\)|                       1st alternative - function
      //   ".+?(?<!\\)"|                     2nd alternative - string with double quotes
      //   '.+?(?<!\\)'|                     3rd alternative - string with apostrophes
      //   [+-]?\d+(?:\.\d+)?(?:%|[A-z]+)?|  4th alternative - integer/decimal number, optionally per cent or with a unit
      //   #[0-9A-f]{3,6}|                   5th alternative - colour
      //   [A-z0-9_-]+|                      6th alternative - keyword
      //   ''|                               7th alternative - empty string
      //   ""                                8th alternative - empty string
      // )
      // [\s,]*
      const regexA = /(\w+\(.+?\)|".+?(?<!\\)"|'.+?(?<!\\)'|[+-]?\d+(?:\.\d+)?(?:%|[A-z]+)?|#[0-9A-f]{3,6}|[A-z0-9_-]+|''|"")[\s,]*/g;
    
      // newObj contains filtered testObject
      const newObj = {};
    
      // Loop through all object properties
      Object.entries(obj).forEach(([key, value]) => {
        // Replace <>;
        value = value.trim().replace('<', '\\00003C').replace('>', '\\00003E').replace(';', '\\00003B');
    
        // Use catch-all regex to split value into specific elements
        const matches = [...value.matchAll(regexA)];
    
        // Now try to build back the original value string from regex matches.
        // If these strings are equal, the value is what we expected.
        // Otherwise it contained some unexpected markup or elements and should
        // be therefore discarded.
        // We specifically set to ignore all occurences of url() and @import
        let buildBack = '';
        matches.forEach((match) => {
          if (Array.isArray(match) && match.length >= 2 && match[0].match(/url\(.+?\)/gi) === null && match[0].match(/@import/gi) === null) {
            buildBack += match[0];
          }
        });
    
        console.log('Compare\n');
        console.log(value);
        console.log(buildBack);
        console.log(value === buildBack);
    
        if (value === buildBack) {
          newObj[key] = value;
        }
      });
    
      return newObj;
    }
    

    Please comment, discuss, criticize, and let me know, if I forgot to touch some topic you are particularly interested in.

    Sources

    Disclaimer: I am not an author, owner, investor or contributor to the below mentioned sources. I just really happened to use them for some information.