Search code examples
hadoopwritable

hadoop CustomWritables


I have more of a design question regarding the necessity of a CustomWritable for my use case:

So I have a document pair that I will process through a pipeline and write out intermediate and final data to HDFS. My key will be something like ObjectId - DocId - Pair - Lang. I do not see why/if I will need a CustomWritable for this use case. I guess if I did not have a key, I would need a CustomWritable? Also, when I write data out to HDFS in the Reducer, I use a Custom Partitioner. So, that would kind of eliminate my need for a Custom Writable?

I am not sure if I got the concept of the need for a Custom Writable right. Can someone point me in the right direction?


Solution

  • Writables can be used for de/serializing objects. For example a log entry can contain a timestamp, an user IP and the browser agent. So you should implement your own WritableComparable for a key that identifies this entry and you should implement a value class that implements Writable that reads and writes the attributes in your log entry.
    These serializations are just a handy way to get the data from a binary format to an object. Some Frameworks like HBase still require byte arrays to persist the data. So you'll have a lot of overhead in transfering this by yourself and messes up your code.