I would like to store the events data in Parquet format (e.g., on HDFS). Do I need to modify the code of the corresponding sinks, or there is a way around it? E.g., using a Flume interceptor.. Thanks.
On the one hand, there was an issue regarding Cygnus about modifying the code having in mind the goal of supporting multiple output formats when writting to HDFS. The modification was done, but only support for our custom Json and CSV formats were coded. This meas the code is ready for being modified in order to add a third format. I've added a new issue regarding the specific Parquet support on OrionHDFSSink
; if you finally decide to do the modification, I can assign you the issue :)
On the other hand, you can always use the native HDFS sink (that persists all the notified body) and, effectively, program a custom interceptor.
As you can see, in both cases you will have to code the Parquet part (or wait until we have room for implementing it).