Search code examples
hadoopserializationhadoop2avro

Hadoop's own Serialization and its relationship with AVRO serialization?


I am trying to understand Avro and came to know that it is one of the Data Serialization frameowork which Hadoop uses.

While learning Hadoop, I came to know that Hadoop uses its own Serlization framework rather than Java's Serialization , so I can see Writable , WritableComparable in Hadoop.

Now, after going through AVRO, it says that Avro is used as Serlization framework.

I am bit confused because of this. So, when we say Hadoop's own serialization framework, are we referring to Avro or something else (which is built in "hadoop" itself).

Can anyone help me understand this?


Solution

  • Hadoop Writables are not Avro, and are "something else"

    Avro is a separate project, and it's schema model allows for nested structures and evolution. Hadoop serialization has no concept schema evolution, as far as I know.

    Thrift is another row-oriented serialization format commonly found in Hadoop projects.

    Other (columnar) data storage formats include Parquet and ORC