I'm attempting to connect to Cassandra in order to do a bulk insert. However, when I attempt to connect, I get an error.
The code I'm using:
from pycassa import columnfamily
from pycassa import pool
cassandra_ips = ['<an ip addr>']
conpool = pool.ConnectionPool('my_keyspace', cassandra_ips)
colfam = columnfamily.ColumnFamily(conpool, 'my_table')
However this fails on that last line with:
pycassa.cassandra.ttypes.NotFoundException: NotFoundException(_message=None, why='Column family my_table not found.')
The column family definitely exists:
cqlsh> use my_keyspace
... ;
cqlsh:my_keyspace> desc tables;
my_table
cqlsh:my_keyspace>
And I don't think this is a simple typo on the table name, as I've check it a dozen times, but also because of this:
In [3]: sys_mgr = pycassa.system_manager.SystemManager(cassandra_ips[0])
In [4]: sys_mgr.get_keyspace_column_families('my_keyspace')
Out[4]: {}
Why is that {}
?
If it matters:
The table was roughly created using:
CREATE TABLE my_table (
user_id int,
year_month int,
t timestamp,
<tons of other attributes>
PRIMARY KEY ((user_id, year_month), t)
) WITH compaction =
{ 'class' : 'LeveledCompactionStrategy', 'sstable_size_in_mb' : 160 };
In order to access CQL3 databases via a thrift API, such as pycassa, the tables must be created using compact storage.
CREATE TABLE my_table (
...
) WITH COMPACT STORAGE;
With regards to the primary keys, from the docs:
Using the compact storage directive prevents you from defining more than one column that is not part of a compound primary key.
Currently you are using a composite partition key but enabling compact storage limits us to using a compound partition key. So you will not have to limit it to a single column, it just has to be part of the compound key. One final reference.