Search code examples
jsonserializationthriftmicroservicesdata-interchange

What are the practical disadvantages of using strongly typed data interchange format (eg thrift / capn proto) in a microservices context?


I'm thinking of introducing a strongly typed (read - with predefined schema) data interchange format for communication between our internal services. For example, I guess something like Thrift or Cap'n Proto.

At least two obvious advantages (to me) of using this over something like JSON is that

  1. you would KNOW the exact format of the data the service can expect (so leaves less room for ambiguity and errors while communicating) and
  2. the implementation generally deserializes the raw message for you and it provides methods for accessing the objects.

What are the practical disadvantages for going this route, versus something like JSON?

For context - our system consists of services written in python and java - and possibly other languages in the future, and communicates via HTTP endpoints between services and message brokers like rabbitmq.


Solution

  • As with every strongly typed system, one of the major advantanges is without a doubt that if you make mistakes, it fails early in the process, typically at the compilation stage, which is a good thing.

    Second biggest advantage IMHO is what you already said: because the fields and types are well known, the compiler, libraries and related code know what data to expect and thus can be written/organized in a more efficient manner - or in short: performance.

    In contrast, a losely typed system (like Avro), while allowing for much greater flexibility without the need of recompiling, comes with the other side of the same coin: the downside of being prone to errors regarding the contents of the message at runtime.

    This is because a losely defined system defines only the syntax of a valid document (like for example XML) and leaves the message-level semantics of what's in the document up to the upper layers. A strongly typed system has the knowledge about those message-level semantics already built in at compile time. Therefore, it is easy to detect/decide whether a particular document or message is not only well-formed but valid with regard to the message contents. If you need to do the same with the losely defined system, you need to provide additional information at runtime (like XML schema) and validate your document against it.

    Bottom line

    What system you prefer is more or less a matter of taste in most cases. I'd make the decision based on the question, how variable the data are that I have to deal with. If it makes sense to use a strongly typed system, I'd go that way, because I like it very much to get informed about errors and mistakes early.

    However, if there is a need for very flexible data structures, it may make more sense to go the other road. Although designing a losely typed schema on top of a strongly typed system is surely possible, it is somewhat contradicting and you'll end up with some overly complicated, while overly generic, thing.