e.g. when handling RESTful APIs you may want to validate the response XML – a custom one in most cases.
I typically use tools already installed on every Mac and fire a http GET request with curl and immediately check it with xmllint like
$ curl http://www.heise.de/newsticker/heise-atom.xml | xmllint --format --schema myschema.xsd -
But I just don’t like to create and edit W3C XML Schemas – the notorious angle brackets hurt my eyes and the redundant element names hide the real stuff in tons of ever same text. Neither do I like to click through graphical schema editors and getting lost hunting for hidden settings and property dialogs.
A minimal and naive schema validating the above example Atom feed (and simply created from the feed itself with trang, see below) as W3C Schema looks like this:
Here comes in RELAX NG, especially it’s “compact form“, which is just what I like – a concise, BNF-ish syntax. It was designed by Murata Makoto and James Clark, Technical Lead of the XML Working Group back when XML was created and father of the famous expat parser.
The very same schema as above as RELAX NG boils down to ½ the lines and about ⅓ of the characters without a single angle bracket:
default namespace = "http://www.w3.org/2005/Atom"
start =
element feed {
title,
element subtitle { text },
link+,
updated,
element author {
element name { text }
},
id,
element entry { title, link, id, updated }+
}
title = element title { text }
link =
element link {
attribute href { xsd:anyURI },
attribute rel { xsd:NCName }?
}
updated = element updated { xsd:dateTime }
id = element id { xsd:anyURI }And as libxml2 and therefore xmllint supports RELAX NG, you can use the regular syntax to validate like in the beginning, but with a much more editable schema:
$ curl http://www.heise.de/newsticker/heise-atom.xml | xmllint --format --relaxng myschema.rng -
trang
is a schema converter for RELAX NG written in Java which I wrapped inside a bash script:
#!/bin/sh java -jar `dirname $0`/trang-20090818/trang.jar $@
Writing a new schema from scratch can be much more convenient if you have a bunch of XML files you can feed into trang:
$ trang *.xml myschema.rnc
then refine the resulting schema in compact form and finally turn it into the regular form:
$ trang myschema.rnc myschema.rng
Trang also serves me as a schema indenter by converting from compact to regular and back.
BUT: trang converts RELAX NG into W3C but not vice versa.
Deep validation
Validating XML documents shouldn’t stop with elements and attributes but rather leverage XML Schema Datatypes and apply e.g. regular expressions
element uuid {
xsd:string {
## A UUID
pattern =
"[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}"
}
}or range constraints
element year {
xsd:unsignedShort { minInclusive = "1900" maxInclusive = "2100" }
}P.S.: For a more complete Atom RELAX NG schema see here or ask your search engine of choice.

Comments 3
cool, ich wollt dich morgen schon ansprechen das Ding mal bekannter zu machen
Posted 27 Mai 2010 at 12:48 am ¶Nice information, I really appreciate the way you presented.Thanks for sharing..
Posted 27 Mai 2010 at 1:55 pm ¶great share, great article, very usefull for me…thank you
Posted 09 Nov 2010 at 10:27 pm ¶Trackbacks & Pingbacks 1
[...] sysadmin « XML Toolbox: RELAX NG & trang [...]
Post a Comment