Search code examples
hadoopcassandra-0.7brisk

Brisk for small files


I am a newbie to Cassandra and Hadoop. While looking for integration of the two products i came across Brisk. From the description i understand that Brisk replaces HDFS for CassandraFS. So this replacement is a solution for small file problem of Hadoop? If so what about large files ? Currently i need to implement a resource storage containing both large binary data files with their meta data and small files such as images.


Solution

  • It's both, really (although I think Brisk has now been rolled into a commercial product, DataStax Enterprise, and isn't being actively developed in its own right).

    Brisk includes CassandraFS (cfs) which is a drop-in replacement for HDFS, so supports large files. Under the hood, these are broken into chunks and stored in Cassandra rows/columns.

    For small files, you can store the data in native Cassandra rows instead of CassandraFS, and run Hadoop jobs over the rows instead.