Trackbacks and MTStripControlChars
A little interlude in the physics reportage. There’s been some more controversy on the subject of Trackbacks.
A bit of background. The Trackback protocol does not discuss the issue of character encodings. Since it proceeds via an HTTP POST, in the absence of any charset declaration, it ought to be assumed that the charset is ISO-8859-1. But, in point of fact, it could be anything.
The obvious long-term solution is for the Trackback Specification to demand that a charset be declared (explicitly or implicitly) and for implementations (like MovableType) to handle the requisite transcoding to/from your blog’s native charset.
But we ain’t there yet1. Right now, you just have to guess at the trackback’s charset, and try to deal intelligently with the result.
Over a year ago, I wrote a plugin to ensure that data (like a trackback) which is purportedly ISO-8859-1 is really valid. Sam Ruby points out that I did an incomplete job of it. There were still some invalid characters that I accepted. That is, as they say, … unacceptable.
So I’ve revised MTStripControlChars to be really bulletproof.
1 After waiting around for six months, I finally implemented my own solution. This doesn’t obviate the need to MTStripControlChars, but it does mean that I don’t have to bone-headedly pretend that all trackbacks are iso-8859-1.
Re: Trackbacks and MTStripControlChars
And even now, you can’t be sure, since a XML parser doesn’t have to support iso-8859-1. However, it is required to support utf-8 and utf-16 so maybe one of those is a better choice.