Search code examples
coldfusionxsswhitelist

What are my options for white-listing HTML in ColdFusion?


I want to allow my users to input HTML.

Requirements

  1. Allow a specific set of HTML tags.
  2. Preserve characters (do not encode ã into ã, for example)

Existing options

  1. AntiSamy. Unfortunately AntiSamy encodes special characters and breaks requirement 2.
  2. Native ColdFusion functions (HTMLCodeFormat() etc...) don't work as they encode HTML into entities, and thus fail requirement 1.
  3. I found this set of functions somewhere, but I have no way of telling how secure this is: http://pastie.org/2072867

So what are my options? Are there existing libraries for this?


Solution

  • Portcullis works well for Cold Fusion for attack-specific issues. I've used a couple of other regex solutions I found on the web over time that have worked well, though they haven't been nearly as fleshed out. In 15 years (10 as a CMS developer) nothing I've built has been hacked....knock on wood.

    When developing input fields of any type, it's good to look at the problem from different angles. You've got the UI side, which includes both usability and client-side validation. Yes, it can be bypassed, but javascript-based validation is quicker, more responsive, and rates higher on the magical UI scale than backend-interruption method or simply making things "disappear" without warning. It will speed up the back-end validation because it does the initial screening. So, it's not an "instead of" but an "in-addition to" type solution that can't be ignored.

    Also on the UI front, giving your users a good quality editor also can make a huge difference in the process. My personal favorite is CKeditor simply because it's the only one that can handle Microsoft Word code on the front-side, keeping it far away from my DB. It seems silly, but Word HTML is valid, so it won't setoff any red flags....but on a moderately sized document it will quickly overload a DB field insert max, believe it or not. Not only will a good editor reduce the amount of silly HTML that comes in, but it will also just make things faster for the user....win/win.

    I personally encode and decode my characters...it's always just worked well so I've never changed practice.