iPhone: libxml2 & RELAX NG validation

Fri, 28. May 2010

Categories: en development Tags: Apple Cocoa iPhone libxml2 NSXMLParser RELAX NG SAX schema validate W3C XML xmllint xmlTextReader

Having a validating parser in place can reduce the required code to parse XML a lot – you know very well what you actually get. As mentioned in my last post about RELAX NG & trang, I prefer RELAX NG over W3C XML Schema – which doesn’t matter anyway because Apple’s suggested XML parser doesn’t validate at all.

So we have to go one level deeper and have a look at libxml2.

Apple’s example „XmlPerformance“ helped to get started, but didn’t do the trick because libxml2 allows validation for xmlDocPtr or xmlTextReader but not for SAX parsers as used in the example.

The libxml2 examples didn’t help me too much either, but luckily there’s xmllint available in source (OSS just rocks) which does almost what we want. It first parses the XML into a xmlDocPtr and validates afterwards – and it does so for a reason:

You can have a validating xmlTextReader (via xmlTextReaderRelaxNGSetSchema), but it won’t detect IDREFs missing their referred to ID and the error messages lack the name of the failing item. BTW – when validating against a W3C schema this ID/IDREF check isn’t available yet.

I finally discarded streaming XML parsing in favour of validation and „push“ parsing (nice for data coming in over the wire) and did:

  1. load the RELAX NG regular form schema (watch out for the assignment of relaxngschemas) – similar to xmllint schema loading,
  2. push the raw XML data into a xmlDocPtr (xmlCreatePushParserCtxt) exactly like xmllint,
  3. validate the in-memory document (xmlRelaxNGValidateDoc),
  4. turn it into a xmlTextReader,
  5. process the reader.

Wrap up:

I may prepare and publish a MroLibxml2Parser inheriting NSXMLParser and firing it’s callbacks in order to easily switch validating and non-validating parser implementations, but this has to wait a bit. Stay tuned.