There’s a spec at the w3c about compressing (XML) named „Efficient XML Interchange“ Format taking into account the grammar and likelihood of atoms within the document. They indeed use something similar the the Huffman Coding.
The results are quite impressive – nice charts!