Skip to the Main Content

Note:These pages make extensive use of the latest XHTML and CSS Standards. They ought to look great in any standards-compliant modern browser. Unfortunately, they will probably look horrible in older browsers, like Netscape 4.x and IE 4.x. Moreover, many posts use MathML, which is, currently only supported in Mozilla. My best suggestion (and you will thank me when surfing an ever-increasing number of sites on the web which have been crafted to use the new standards) is to upgrade to the latest version of your browser. If that's not possible, consider moving to the Standards-compliant and open-source Mozilla browser.

February 26, 2007


Update (5/25/2007):

Sam Ruby ported this Sanitizer to HTML5lib. For most purposes, that’s a much more robust foundation, so all my future efforts will be devoted to the HTML5lib version.


Rudimentary documentation is available.

The original version of the Sanitizer, described in this post, can be found here.

What free time which might, otherwise, have been devoted to blogging last week, was devoted to another matter.

On Monday, I discovered that Instiki, including my MathML-enabled branch, was vulnerable to Cross-Site Scripting (XSS). That is, visitors to an Instiki Wiki could inject malicious javascript code onto your page.

Rails has a built-in sanitization filter, but this was not being applied. So, my first impulse was to apply Rails’s built-in sanitization filter. Unfortunately, it suffers from two defects

  1. It has trouble with malformed HTML. This was not a problem for me, as I intended to apply it to the well-formed XHTML output by Maruku.
  2. Even on well-formed XHTML, it doesn’t actually work worth a damn. All but the lamest of script-injection tricks sail right past it.

I turned to Google, and found a Rails plugin that is supposed to improve upon it. It works considerably better, but was still not adequate to my needs.

Finally, I turned to Sam Ruby for advice. Sam pointed me to the sanitization code he wrote for the Universal FeedParser.

So I sat down and wrote a sanitization function1 for Instiki based, in part, on Sam’s Python code.

  1. It applies a white-list of XHTML+MathML+SVG elements, allowing through only the (extensive list of) known safe ones
  2. It applies a similar white-list of XHTML+MathML+SVG attributes.
  3. For attributes whose values are URIs (e.g. href, src, xlink:href, …), it applies a white-list of safe URI schemes (after normalizing, to foil attempts at obfuscating the URI).
  4. Inline style attributes are parsed, and only a known safe (but still extensive) set of CSS properties and values are allowed.
  5. It handles case-sensitive element and attribute names, which is important for SVG, which uses camel-cased names.
  6. It comes with unit tests, lots of unit tests.

On learning of the vulnerability, over a week ago, I immediately emailed the maintainer of the main Instiki branch, to tell him about the vulnerability and of my intention to provide a fix. Three days later, after coding up my fix, I sent that to him, as well.

Eventually, after much pestering, my changes were committed to the Instiki SVN repository. But I still don’t know when Matthias is going to get around to releasing a new version. He indicated that he’s very busy, and I think I have failed to convince him of the urgency of the matter. Rather than waiting around for a new version, Instiki 0.11 users should fix their installations now. Doing so is easy enough. In fact, it’s almost surely easier and faster than installing a whole new version of Instiki (whenever that should happen to appear).Update: Instiki 0.11pl1 has been released. It contains the fix for this XSS attack, as well as some other miscellaneous fixes.

If you’re using my distribution of Instiki, you should download the latest version and follow the instructions to upgrade your Instiki installation.

If you’re using Instiki 0.11.0, you should download the latest release. If, for some reason, you don’t want to upgrade, then at a minimum:

  1. Download the following files from the Instiki SVN repository, placing them in the corresponding directories of your Instiki installation: Alternatively, you can download a tarball containing all four of the above files.
  2. Finally, restart Instiki.

If you’re using my branch of Instiki, please don’t use the above lib/chunks/engines.rb file. It’s 0.11.0-specific. The file you want is in the distribution or in my BZR repository.

If you want to test whether your Instiki installation is vulnerable, try typing


on a page which uses the Markdown (or Markdown+itex2MML) filter, or


on a page which uses the Textile filter. Or try

<a href="bar" onclick="alert('fubar');return false;">foo</a>


<p style="-moz-binding:url(';;distler/blog/files/warning.xml#xss')">fubar</p>


p{-moz-binding:url(';;distler/blog/files/warning.xml#xss')}. fubar

for you Textile users) or any one of the myriad of other script-injection tricks.


More generally, it’s a huge disappointment that Rails does not ship with a decent XSS-sanitization function built-in and enabled by default. I suppose that, if one is building a Rails app which doesn’t accept any user-input content, or which aggressively strips out all vestiges of HTML from that content, then one might not really need one. But people are building Wikis and Blogs and all kinds of “Web 2.0” applications using Rails, many of which either accept HTML, or accept some pseudo-markup that gets translated into HTML.

The fact that the built-in sanitization function

  1. is not enabled, by default
  2. is largely ineffective, when it is enabled

is a huge, potential security hole in each of those Rails applications.

This is not unknown. The Rails bug tracker is filled with open tickets suggesting that TextHelper#sanitize is broken and needs to be fixed. Nothing in this blog post should come as a surprise to anyone in the Rail community. Well, OK, it was a little surprising that Instiki didn’t even avail itself of TextHelper#sanitize. But, give the weakness of the latter, it hardly would have made much difference if it did.

I was, initially, somewhat torn as to whether to publicize this issue on my blog. But, given both the seriousness and the widespread nature of the problem, there’s really no alternative. I don’t think that I could even enumerate all the vulnerable Rails applications, let alone track down and contact their developers about implementing a fix.

At least this way, I’m coming to the table with codesanitize_html(string) — that can be used to fix those applications which turn out to be vulnerable.

1 Since I used the same HTML tokenizer that Rails’s built-in sanitizer does, my code probably also misbehaves on sufficiently malformed HTML. If you are trying to sanitize tag-soup, you need to parse it, using the same error-corrections that browsers do. Your only real hope is to use HTML5lib, of which a Ruby version doesn’t yet exist.

Posted by distler at February 26, 2007 6:41 PM

TrackBack URL for this Entry:

17 Comments & 6 Trackbacks


I have a white_list helper plugin that I’ve been using in Beast/Mephisto/* that works great. I wrote it with the intention of replacing #sanitize with it in core, and it is currently a candidate for Rails 2.0.

Here’s the plugin if you want to check it out:

Posted by: rick on February 27, 2007 10:47 AM | Permalink | Reply to this

Rails plugin

As I said above, your plugin was the first thing I looked at, after discovering that sanitize() didn’t work worth a damn.

I had several issues with it (which you can probably figure out, by examining the unit tests that I wrote). In the end, I decided that it was easiest just to write my own function.

If you want something done right …

Posted by: Jacques Distler on February 27, 2007 11:25 AM | Permalink | PGP Sig | Reply to this


Much appreciated!

Posted by: Blake Stacey on February 27, 2007 12:00 PM | Permalink | Reply to this


If you are trying to sanitize tag-soup, you need to parse it, using the same error-corrections that browsers do. Your only real hope is to use HTML5lib, of which a Ruby version doesn’t yet exist.

Oddly enough a HTML sanitizer was one of the projects I had in the back of my mind to go in the html5lib examples/ directory. Maybe I should increase its priority a little.

Posted by: jgraham on February 27, 2007 12:45 PM | Permalink | Reply to this


Using the tarball you’ve provided above (step 1 under “If you’re using Instiki 0.11.0:”), when editing pages you get:

MissingSourceFile in Wiki#save
no such file to load -- maruku/ext/math

engines.rb contains:

require_dependency 'maruku'
require_dependency 'maruku/ext/math'

Since replacing it with the one from svn fixed the problem, the tar possibly has the wrong version of the engines.rb file? Or I could have completely misunderstood…

Posted by: Andy on February 27, 2007 3:24 PM | Permalink | Reply to this

File mixup


Somehow the lib/chunks/engines.rb file from my distribution crept its way into the tarball for Instiki 0.11.0.

That’s fixed now.

Sorry for the mixup.

Posted by: Jacques Distler on February 27, 2007 3:59 PM | Permalink | PGP Sig | Reply to this


Sorry, rude of me. Should also have said: many thanks for providing the patch!

Posted by: Andy on February 27, 2007 3:26 PM | Permalink | Reply to this
Read the post S5
Weblog: Musings
Excerpt: All about S5 support in Instiki.
Tracked: March 10, 2007 8:36 PM


How does acts_as_sanitized compare to this?

Posted by: Niko on March 14, 2007 8:54 AM | Permalink | Reply to this

It’s all an act…

From the 10 seconds spent looking at the source code, it’s clear that acts_as_sanitized is a nothing more than thin wrapper around the sanitize() function that I justifiably ridiculed above. (Well, OK, acts_as_sanitized can, optionally, envoke strip_tags() instead of sanitize().)

Next question?

[The question you should have asked is: “Can I modify acts_as_sanitized to wrap your sanitize_html() function instead?” The answer is, “Yes, … in about 10 seconds.”]

Posted by: Jacques Distler on March 14, 2007 9:11 AM | Permalink | PGP Sig | Reply to this

Re: It’s all an act…

You’re right. Sorry for even bothering you. My fault.

(Though I guess the answer to the question “Can I modify acts_as_sanitized to wrap your sanitize_html() function instead?” would have been “From the 10 seconds spent looking at the source code, it’s clear that you can do this in 10 seconds.” and it would have been the right answer).

Posted by: Niko on March 14, 2007 11:32 AM | Permalink | Reply to this

Re: It’s all an act…

Isn’t it nice that some questions just answer themselves? ;-)

In any case, it’s great to have a link to the plugin here. Thanks.

Posted by: Jacques Distler on March 14, 2007 11:41 AM | Permalink | PGP Sig | Reply to this



Do you have any interest in turning this into a standalone Rails plugin? If not, would you mind if I did? Please contact me via the email provided if so…


Posted by: Rob Sanheim on March 15, 2007 2:17 PM | Permalink | Reply to this


You mention extensive white lists…

…I’ve built a similar tool for a site I’m building (that starts with tagsoup parsed data, so it *should* be well formed xhtml), but what I’m missing are those white lists…

…any chance you can make them available?

Posted by: Dale Newfield on April 26, 2007 11:40 AM | Permalink | Reply to this

Show me the code

All of the code is available from my BZR repository. The sanitize code is here.

Posted by: Jacques Distler on April 26, 2007 12:00 PM | Permalink | PGP Sig | Reply to this

Re: Show me the code

I don’t know Ruby, so I’m having a bit of difficulty understanding that code. I’m currently looking at the sanitize_css function, trying to determine just how you do that. Below is my understanding–please correct me if any of this is wrong.

For each style property specified, if the name is in your list then it is allowed through without interrogating the value. If the name is not in your list, but begins with “background-“, “border-“, “margin-“, or “padding-” then the value is checked against a static list and a regexp that looks like hex colors, rgb triples, or a number with optional unit, and only if the value matches one of those is it let through.

Is this description accurate? Is this the process you intended?

Posted by: Dale Newfield on April 27, 2007 3:58 PM | Permalink | Reply to this

Re: Show me the code

If you know Python, you can look at Sam Ruby’s original code. My sanitize_css function was a straight transcription of his code from Python to Ruby (an interesting exercise, as I know no Python).

Is this description accurate? Is this the process you intended?

Yes, except you skipped the part about removing some of the bad stuff (like url’s as attribute values).

Posted by: Jacques Distler on April 27, 2007 4:14 PM | Permalink | PGP Sig | Reply to this
Read the post SVG Comments
Weblog: Musings
Excerpt: Kicking it up a notch.
Tracked: April 27, 2007 3:23 AM
Read the post HTML5 Sanitizer
Weblog: Sam Ruby
Excerpt: A while back, I commented that I would likely backport Jacques’s sanitizer to Python.  I still haven’t gotten around to that, but I have ported it to html5lib (source, tests. My approach was slightly different.  I ma
Tracked: May 22, 2007 8:02 PM
Read the post XSS 2
Weblog: Musings
Excerpt: Security is a journey, not a destination.
Tracked: September 2, 2007 2:06 AM
Read the post Instiki and Rails 2.0
Weblog: Musings
Excerpt: Shiny.
Tracked: December 24, 2007 12:46 AM

Good filter with PHP

The htmLawed script can provide similar (and more) functionalities for PHP users. It filters XSS code, specified HTML tags/attributes, balances/properly nests elements, and so on.

Posted by: Santosh Patnaik on January 26, 2008 1:30 PM | Permalink | Reply to this
Read the post Instiki Updates
Weblog: Musings
Excerpt: HTML5lib sucks.
Tracked: May 19, 2008 10:57 AM

Post a New Comment