XML Toolbox: RELAX NG & trang

e.g. when handling RESTful APIs you may want to validate the response XML – a custom one in most cases.

I typically use tools already installed on every Mac and fire a http GET request with curl and immediately check it with xmllint like

$ curl http://www.heise.de/newsticker/heise-atom.xml | xmllint --format --schema myschema.xsd -

But I just don’t like to create and edit W3C XML Schemas – the notorious angle brackets hurt my eyes and the redundant element names hide the real stuff in tons of ever same text. Neither do I like to click through graphical schema editors and getting lost hunting for hidden settings and property dialogs.

A minimal and naive schema validating the above example Atom feed (and simply created from the feed itself with trang, see below) as W3C Schema looks like this:

Naive Atom W3C Schema

Naive Atom W3C Schema

Here comes in RELAX NG, especially it’s “compact form“, which is just what I like – a concise, BNF-ish syntax. It was designed by Murata Makoto and James Clark, Technical Lead of the XML Working Group back when XML was created and father of the famous expat parser.

The very same schema as above as RELAX NG boils down to ½ the lines and about ⅓ of the characters without a single angle bracket:

default namespace = "http://www.w3.org/2005/Atom"
 
start =
  element feed {
    title,
    element subtitle { text },
    link+,
    updated,
    element author {
      element name { text }
    },
    id,
    element entry { title, link, id, updated }+
  }
title = element title { text }
link =
  element link {
    attribute href { xsd:anyURI },
    attribute rel { xsd:NCName }?
  }
updated = element updated { xsd:dateTime }
id = element id { xsd:anyURI }

And as libxml2 and therefore xmllint supports RELAX NG, you can use the regular syntax to validate like in the beginning, but with a much more editable schema:

$ curl http://www.heise.de/newsticker/heise-atom.xml | xmllint --format --relaxng myschema.rng -

trang

is a schema converter for RELAX NG written in Java which I wrapped inside a bash script:

#!/bin/sh
java -jar `dirname $0`/trang-20090818/trang.jar $@

Writing a new schema from scratch can be much more convenient if you have a bunch of XML files you can feed into trang:

$ trang *.xml myschema.rnc

then refine the resulting schema in compact form and finally turn it into the regular form:

$ trang myschema.rnc myschema.rng

Trang also serves me as a schema indenter by converting from compact to regular and back.

BUT: trang converts RELAX NG into W3C but not vice versa.

Deep validation

Validating XML documents shouldn’t stop with elements and attributes but rather leverage XML Schema Datatypes and apply e.g. regular expressions

  element uuid {
    xsd:string {
 
      ## A UUID
      pattern =
        "[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{4}-[0-9a-fA-F]{12}"
    }
  }

or range constraints

        element year {
          xsd:unsignedShort { minInclusive = "1900" maxInclusive = "2100" }
        }

P.S.: For a more complete Atom RELAX NG schema see here or ask your search engine of choice.

flattr this!

Comments 3

  1. ginda wrote:

    cool, ich wollt dich morgen schon ansprechen das Ding mal bekannter zu machen

    Posted 27 Mai 2010 at 12:48 am
  2. w3cvalidation wrote:

    Nice information, I really appreciate the way you presented.Thanks for sharing..

    Posted 27 Mai 2010 at 1:55 pm
  3. registry cleaner reviews wrote:

    great share, great article, very usefull for me…thank you

    Posted 09 Nov 2010 at 10:27 pm

Trackbacks & Pingbacks 1

  1. From iPhone: libxml2 & RELAX NG validation « Der M-Blog on 28 Mai 2010 at 1:20 pm

    [...] sysadmin « XML Toolbox: RELAX NG & trang [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *