The infernal apostrophe. Or, why programmers loathe the Irish.

Do you blog using Wordpress? Have you tried adding a blogroll link that contains an apostrophe? Is it sometimes preceded by a visible backslash, and othertimes not? Say it like a pirate: "Aaaargh."

In Wordpress's defense, apostrophes are the bane of programmers. The problem is that, in programming expressions, single quotes (as they're referred to in this context) are often used to delimit strings of text. Therefore, to include an apostrophe in such a string, you have to "escape" it, usually with a backslash prefix, but in some cases by typing it twice.

That system works dandy for monolithic desktop programs. It breaks apart rather dramatically when a single "program" spans several language boundaries. In a typical LAMP web application there may be two or three such boundaries, for example: SQL to PHP to JavaScript (HTML too, but it's less sensitive to malapostrophication). Each language can accommodate the literal apostrophes it has been given, but when it comes time to pass them across the boundary, it must (re)escape them for the next language down the line. As the man in the middle, PHP has to handle the bulk of these translations, and so includes an appropriately-named feature called Magic Quotes.

Now I've worked with PHP for some years, and I'm not afraid to admit that I'm never entirely certain what magic quotes will do in any given situation. That they are magical is undisputed.

At one point, I grew so infuriated with apostrophes that I encoded them as HTML entities right from the git-go, and there are still systems out there with ' (because ' didn't work in IE at one point) littered through the database. Of course, that approach has many failings, not the least of which is calculating the length of strings. And if you want to hook a non-HTML client straight to the database to run reports, well, you can forget that idea, buster.

There are overly simple solutions to the apostrophe quandary just as there are overly complex solutions (i.e. magic). Aside from tedious care, there doesn't appear to be a middle-of-the-road approach. And mishandling apostrophes is a security issue, not just an aesthetic one.

So wouldn't it just be easier to get rid of all of the O'Reillys, O'Malleys, and O'Rourkes? Nothing violent, of course, just a mass renaming. From what I understand of the Irish, they're very forgiving when asked to discard centuries of custom and tradition...

Comments