Search code examples
cassandracql3datastax-java-driver

Using Cassandra and CQL3, how do you insert an entire wide row in a single request?


I want to insert a single row with 50,000 columns into Cassandra 1.2.8. Before inserting, I have all the data for the entire row ready to go (in memory):

+---------+------+------+------+------+-------+
|         | 0    | 1    | 2    | ...  | 49999 |
| row_id  +------+------+------+------+-------+
|         | text | text | text | ...  | text  |
+---------+------+------+------|------+-------+

The column names are integers, allowing slicing for pagination. The column values are a value at that particular index.

CQL3 table definition:

create table results (
    row_id text,
    index int,
    value text,
    primary key (row_id, index)
) 
with compact storage;

As I already have the row_id and all 50,000 name/value pairs in memory, I just want to insert a single row into Cassandra in a single request/operation so it is as fast as possible.

The only thing I can seem to find is to do execute the following 50,000 times:

INSERT INTO results (row_id, index, value) values (my_row_id, ?, ?);

the first ? is is an index counter (i) and the second ? is the text value to store at location i.

This takes a lot of time. Even when we put the above INSERTs into a batch, it takes a lot of time.

We have all the data we need (the complete row) in its entirety, I would assume it to be very easy to just say "here, Cassandra, store this data as a single row in one request", for example:

//EXAMPLE-BUT-INVALID CQL3 SYNTAX:
insert into results (row_id, (index,value)) values 
    ((0,text0), (1,text1), (2,text2), ..., (N,textN));

This example isn't possible via current CQL3 syntax, but I hope it illustrates the desired effect: everything would be inserted as a single query.

Is it possible to do this in CQL3 and the DataStax Java Driver? If not, I suppose I'll be forced to use Hector or the Astyanax driver and the Thrift batch_insert operation instead?


Solution

  • Edit: only 4 days after I posted this question regarding Cassandra 1.2.9, Cassandra 2.0 final was released. 2.0 supports batch prepared statements, which should be much faster than the non-batched CQL3 that was required to be used for C* < 2.0. We have not yet tested this to be sure.

    When this question was posted 4 days ago on 30 August 2013, it was not possible in CQL3 for C* versions less than 2.0. It was only possible via a Thrift client, e.g. Astyanax's MutationBatch.

    Per Alex's suggestion, I created CASSANDRA-5959 as a feature request, but it was marked as a duplicate to CASSANDRA-4693, which supposedly solved the issue for C* 2.0.