I never really got the 'why' behind XML Schemas. Why would you validate a docume...

RodgerTheGreat · on Nov 14, 2010

If you have multiple applications using a document format, XML schemas are a good way to make sure they all agree on the semantics of the format. Recall that XML was originally intended as a means of bridging different environments and platforms, where code will not necessarily be portable.

andrewvc · on Nov 15, 2010

I guess, but even if that's the case, who cares if the message is formatted right if it's still invalid.

I guess I haven't really worked in a heterogenous environment with a lot of XML before though, I can see the potential value there, but i'm still doubtful.

zdw · on Nov 15, 2010

The big win for schemas is being able to use them to make assumptions about input, which simplifies your code.

For example, if you have an RelaxNG schema (which has a great compact syntax) that says that an element has to have at least one child node and that child node contains an integer between a certain set of values. Once you have that schema, you could write code that could read in the XML file and validate it against the schema in 2 lines, then grab all the child node integers with one XPath expression.

The data might be junk (heck, I'm not aware of any format that is impervious to worthless data), but at least it's junk in the right format, and you never had to mess around with parsing the input.

Need to switch programming environments or languages because you're working on Unix/Windows/embedded system/mainframe/database/web browser ? The schema can move with you (or be converted to another schema format that does), and programming niceties SAX and XPath will often carry over too.

andrewvc · on Nov 15, 2010

I can totally see the value now. I haven't used XML in heterogenous environments much, but I can see the value now.

Thanks for the careful response!

prodigal_erik · on Nov 14, 2010

Without a neutral schema language, the validation rules need to be reimplemented in every programming language used by anyone who works with the format. And there's almost no chance they will exactly agree on which documents are valid or not, because you're either playing telephone between one implementation and the next or just starting from a vague schema in prose.

jhrobert · on Nov 15, 2010

When you say "every programming language" I assume you mean "JavaScript"?

I would not be surprised if embedding a javascript interpreter and running a validation script was probably already easier in most languages than embedding an XML parser & validator.

JavaScript has become the lingua franca of programming. I hope they teach that in schools. http://en.wikipedia.org/wiki/Lingua_franca

prodigal_erik · on Nov 15, 2010

I would expect a JavaScript interpreter to be even slower than a streaming XML parser. And one good thing about XML Schema is that it doesn't require DOM, in fact it doesn't say anything about representing document data in memory. This is also true of the JSON Schema draft. But hand-coded JavaScript validation rules aren't likely to be context-free, and we'd constantly be fighting people who assume everyone can afford to store the entire document as one huge DOM-like map of arbitrary keys and values of unknown types. It's possible to build a validator on top of something like http://jackson.codehaus.org/, but I don't believe the industry has the diligence to do it that way.

jhrobert · on Nov 15, 2010

"Worse is Better" says it much better than I can, the dynamics of success are not what we may first think they are.

From one point of view, JavaScript is the "Worse is Better" version of Lisp. Guess who's winning?

Ditto XML vs JSON

aboodman · on Nov 15, 2010

Also, writing validation code by hand leads to a lot of duplication and room for error.