I'm writing an application that uses Hector to access a Cassandra database. I have some situations where I only need to query one column, and some where I need to query multiple columns at once. Writing one method that takes an array of column names and returns a list of columns using SliceQuery would be simplest in terms of code, but I'm wondering whether there's a significant drawback to using SliceQuery for one column compared to using ColumnQuery.
In short, are there enough (or any) performance benefits of using ColumnQuery over SliceQuery for one column to make it worth the extra code to deal with a one-column case separately?
By looking at Hector's code , the difference between using a ColumnQuery (ThriftColumnQuery.java) and a SliceQuery (ThriftSliceQuery.java) is the different thrift command being sent - "get" or "get_slice" (respectively).
I didn't find an exact documentation of how each of those operations are implemented by Cassandra's server, but I took a quick look in Cassandra's sources and after examining CassandraServer.java I got the impression that the "get" operation is there more for client's convenience than for better performance when querying a single column:
SliceByNamesReadCommand
instance is created and executed.setColumnNames
method and not setRange
), a SliceByNamesReadCommand
instance is created for each of the wanted columns and then executed (the row is read only once though).Bottom line, as far as I see it there's not much more than the (negligible) overhead of creating some collections meant for handling the multiple columns. If you're still worried however, I believe it shouldn't be too difficult to handle the two cases differently when wrapping the use of Hector in your DAOs.
Hope I managed to help.