16 October 2005

XML Sucks

Pain. Once again, I have had to put structured data in a text file. Once again, I have had to decide whether to use a sane, simple format for the data, knocking up a parser for it in half an hour, or whether to use XML, sacrificing simplicity of code and easy editability of data on the altar of standardisation. Once again, I've had to accept that sanity is out and XML is in.

The objections to XML seem trivial. It's verbose - big deal. It has a pointless distinction between "element content" and "attributes" which makes unneccessary complexity, but not that much unnecessary complexity. It is hideously hard to write a parser for, but who cares? the parsers are written, you just link to one.

The triviality of the objections are put in better context alongside the triviality of the problem which XML solves. XML is a text format for arbitrary heirarchically-structured data. That's not a difficult problem. I firmly believe that I could invent one in 15 minutes, and implement a parser for it in 30, and that it would be superior in every way to XML. If a solution to a difficult problem has trivial flaws, that's acceptable. If a solution to a trivial problem has trivial flaws, that's unjustifiable.

And yet XML proliferates. Why?

Since the only distinctive thing about it is its sheer badness, that is probably the reason. Here's the mechanism: There was a clear need for a widely-adopted standard format for arbitrary heirarchically-structured data in text files, and yet, prior to XML none existed. Plenty of formats did exist, most of them clearly superior to XML, but none had the status of a standard.

Why not? Well, because the problem is so easy. It's easier to design and implement a suitable format than to find, download and learn the interface to someone else's. Why use someone else's library for working with, say, Lisp S-expressions when you could write your own just as easily, and have it customised precisely to your immediate needs? So no widely-used standard emerged.

On the other hand, if you want something like XML, but with a slight variation, you'd have to spend weeks implementing its insanities. It's not worth it - you're be better of using xerces and living with it. Therefore XML is a standard, when nothing else has been.

This is not the "Worse is Better" argument - it's almost the opposite. The original Richard Gabriel argument is that a simple, half-solution will spread widely because of its simplicity, while a full solution will be held back by its complexity. But that only applies to complex problems. In heirarchical data formats, there is no complex "full solution" - the simple solutions are also full. That is why we went so long without one standard. "Worse is Better" is driven by practical functionality over correctness. "Insane is Better" is driven by the (real) need for standardisation over practical functionality, and therefore the baroque drives out the straightforward. Poor design is XML's unique selling point.

No comments: