I am trying to map and insert data to cassandra using hive. Having only little experience in both, I tried to read and follow this: http://www.datastax.com/docs/datastax_enterprise3.0/solutions/about_hive
From my understanding, this is what I do:
I create an external table in hive, using CassandraStorageHandler, mapping to the right keyspace and to the cassandra column using :key, :column and :value special names:
CREATE EXTERNAL TABLE test_table (myproductid INT , mydate TIMESTAMP , quantity BIGINT )
STORED BY 'org.apache.hadoop.hive.cassandra.CassandraStorageHandler'
WITH SERDEPROPERTIES ( "cassandra.ks.name" = "test",
"cassandra.columns.mapping" = ":key,:column,:value");
I insert data in hive (nothing specific):
INSERT OVERWRITE TABLE test_table
SELECT anId, aTimestamp, COUNT(*)
FROM myDataTable
GROUP BY anId, aTimestamp;
Everything seems find from Hive, the table description is right and data appear in the table as expected.
From Cassandra, DESCRIBE test_table
shows:
CREATE TABLE test_table (
key blob,
column1 blob,
value blob,
PRIMARY KEY (key, column1)
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
Data is not typed, and if I SELECT
it, it appears as binary data:
0x3238373639 | 0x323031332d30312d30322030303a30303a3030 | 0x31
I tried to follow the documentation and add to SERDEPROPERTIES:
"cassandra.cf.validatorType" = "Int32Type, DateType, LongType"
and also :
"cassandra.cql3.type" = "int, timestamp, bigint"
But no change.
I read some more documentation, specifically about UDFs to convert data types, but I'd like to avoid this overhead if it is not essential.
Am I missing something? Am I missing a step or things are not done right?
Thanks!
"cassandra.cf.validatorType" = "Int32Type, DateType, LongType" should match the Column Family/Table column data type. Your test_table columns are blob, so you have to encode it back to binary when you insert data to the table.
CassandraStorageHandler is depreciated. You can refer to http://www.datastax.com/docs/datastax_enterprise3.2/solutions/about_hive for the latest CqlStorageHandler.