indexing cassandra hector dynamic-columns

Check via Hector if secondary index already exists for a dynamic column in Cassandra

After the data import to my Cassandra Test-Cluster I found out that I need secondary indexes for some of the columns. Since the data is already inside the cluster, I want to achieve this by updating the ColumnFamilyDefinitions.

Now, the problem is: those columns are dynamic columns, so they are invisible to the getColumnMetaData() call.

How can I check via Hector if a secondary index has already been created and create one if this is not the case? (I think the part how to create it can be found in http://comments.gmane.org/gmane.comp.db.hector.user/3151 )

If this is not possible, do I have to copy all data from this dynamic column family into a static one?

Solution

No need to copy all data from dynamic column family into static one.

Then How?? Let me explain you with an example, Suppose you have an CF schema mentioned below:

CREATE TABLE sample (
  KEY text PRIMARY KEY,
  flag boolean,
  name text
)

NOTE I have done indexing on flag and name.

Now here are some data in the CF.

 KEY,1 | address,Kolkata | flag,True | id,1 | name,Abhijit
 KEY,2 | address,Kolkata | flag,True | id,2 | name,abc
 KEY,3 | address,Delhi | flag,True | id,3 | name,xyz
 KEY,4 | address,Delhi | flag,True | id,4 | name,pqr
 KEY,5 | address,Delhi | col1,Hi | flag,True | id,4 | name,pqr

From the data you can understand that address, id & col1 all are dyamically created.

Now if i query something like that

SELECT * FROM sample WHERE flag =TRUE AND col1='Hi';

Note: col1 is not indexed, but i can filter using that field

Output:

  KEY | address | col1 | flag | id | name
 -----+---------+------+------+----+------
    5 |   Delhi |   Hi | True |  4 |  pqr

Another Query

 SELECT * FROM sample WHERE flag =TRUE AND id>=1 AND id <5 AND address='Delhi';

Note: Here neither id is indexed, nor the address, still i am getting the output

Output:

  KEY,3 | address,Delhi | flag,True | id,3 | name,xyz
  KEY,4 | address,Delhi | flag,True | id,4 | name,pqr      
  KEY,5 | address,Delhi | col1,Hi | flag,True | id,4 | name,pqr

So basically if you have a column which value is always something you know, and its being indexed. Then you can easily filter on the rest of the dynamic columns aggregating them with indexed always positive column.