Search code examples
xml-serializationbinaryprotocol-buffersthriftexi

advice on choosing different binary xml tools


My requirement is to compress xml file into a binary format, transmit it and decompress it (lightening fast) before i start parsing it.

There are quite a few binary xml protocols and tools available. I found EXI (efficient xml interchange) better as compared to others. Tried its open source version Exificient and found it good.

I heard about google protocol buffers and facebook's thrift, can any one tell me if these two can do the job i am looking for?

OR just let me know if there is anything better then EXI i should look for.

Also, There is a good XML parser VTD-XML (haven't tried myself, just googled about it and read some articles) that accomplishes better parsing performances as compared to DOM,SAX and Stax.

I want best of both worlds, best compression + best parsing performance, any suggestions?

One more thing regarding EXI, how can EXI claim to be fast at parsing a decoded XML file? Because it is being parsed by DOM, SAX or STax? I would have believed this to be true if there was another binary parser for reading the decoded version. Correct me if i am wrong.

ALSO, is there any good C++ open source implementation for EXI format? A version in java is available by EXIficient, but i am not able to spot a C++ open source implementation?

There is one by agile delta but that's commercial.


Solution

  • You mention protocol buffers (protobuf); this is a binary format, but has no direct relationship to XML. In partiular, no member-names (element names / attribute names / namespaces) are encoded - it is just the data (with numeric markers for identifiers).

    As such, you cannot reconstruct arbitrary XML from a protobuf stream unless you already know how to map "field 3" etc.

    However! If you have an object-model that works with both XML and protobuf, the transform is trivial; deserialize with either - serialize with either. How well this works depends on the implementation; for example, it is trivial with protobuf-net and is actually how I do the codegen (load the binary; write as XML; run the XML through an xslt layer to emit code).

    If you actually just want to transfer object data (and XML is just a proposed implementation detail), then I thoroughly recommend protobuf; platform independent, a wide range of implementations, version-tolerant, very small output, and very fast processing at both read and write.