After the data import to my Cassandra Test-Cluster I found out that I need secondary indexes for some of the columns. Since the data is already inside the cluster, I want to achieve this by updating the ColumnFamilyDefinitions.
Now, the problem is: those columns are dynamic columns, so they are invisible to the getColumnMetaData() call.
How can I check via Hector if a secondary index has already been created and create one if this is not the case? (I think the part how to create it can be found in http://comments.gmane.org/gmane.comp.db.hector.user/3151 )
If this is not possible, do I have to copy all data from this dynamic column family into a static one?
No need to copy all data from dynamic column family into static one.
Then How?? Let me explain you with an example, Suppose you have an CF schema mentioned below:
CREATE TABLE sample (
KEY text PRIMARY KEY,
flag boolean,
name text
)
NOTE I have done indexing on flag and name.
Now here are some data in the CF.
KEY,1 | address,Kolkata | flag,True | id,1 | name,Abhijit
KEY,2 | address,Kolkata | flag,True | id,2 | name,abc
KEY,3 | address,Delhi | flag,True | id,3 | name,xyz
KEY,4 | address,Delhi | flag,True | id,4 | name,pqr
KEY,5 | address,Delhi | col1,Hi | flag,True | id,4 | name,pqr
From the data you can understand that address, id & col1 all are dyamically created.
Now if i query something like that
SELECT * FROM sample WHERE flag =TRUE AND col1='Hi';
Note: col1 is not indexed, but i can filter using that field
Output:
KEY | address | col1 | flag | id | name
-----+---------+------+------+----+------
5 | Delhi | Hi | True | 4 | pqr
Another Query
SELECT * FROM sample WHERE flag =TRUE AND id>=1 AND id <5 AND address='Delhi';
Note: Here neither id is indexed, nor the address, still i am getting the output
Output:
KEY,3 | address,Delhi | flag,True | id,3 | name,xyz
KEY,4 | address,Delhi | flag,True | id,4 | name,pqr
KEY,5 | address,Delhi | col1,Hi | flag,True | id,4 | name,pqr
So basically if you have a column which value is always something you know, and its being indexed. Then you can easily filter on the rest of the dynamic columns aggregating them with indexed always positive column.