Search code examples
hadoopserializationprotocol-buffersthriftavro

Thrift, Avro, Protocolbuffers - Are they all dead?


Working on a pet project (cassandra, spark, hadoop, kafka) I need a data serialization framework. Checking out the common three frameworks - namely Thrift, Avro and Protocolbuffers - I noticed most of them seem to be dead-alive having 2 minor releases a year at most.

This leaves me with two assumptions:

  • They are as complete as such a framework should be and just rest in maintenance mode as long as no new features are needed
  • There is no reason to exist for such framework - not being obvious to me why. If so, what alternatives are out there?

If anyone could give me a hint to my assumptions, any input is welcome.


Solution

  • The advantage of Thrift compared to Protobuf is that Thrift offers a complete RPC and serialization framework. Plus Thrift supports about 20+ target languages and that number is still growing. We are about to include .NET core and there will be Rust support in the not-so-far future.

    The fact that there have been not that many Thrift releases in the last months is surely something that needs to be addressed, and we are fully aware of it. On the other hand, the overall stability of the codebase is quite good, so one may do a Github fork and cut a branch on its own from current master as well - of course with the usual quality measures.

    The main difference between Avro and Thrift is that Thrift is statically typed, while Avro uses a more dynamic approach. In most cases a static approach fits the needs quite well, in that case Thrift lets you benefit from the better performance of generated code. If that is not the case, Avro might be more suitable.

    Also it is worth mentioning that besides Thrift, Protobuf and Avro there are some more solutions on the market, such as Capt'n'proto or BOLT.