I have the following Cassandra table:
CREATE TABLE myflights.flights_by_airport2 (
origin text,
dep_time timestamp,
fl_date timestamp,
airline_id int,
carrier text,
fl_num int,
PRIMARY KEY ((origin), dep_time)
) WITH CLUSTERING ORDER BY (dep_time ASC);
cqlsh:myflights> select * from flights_by_airport2 limit 5;
origin | dep_time | airline_id | carrier | fl_date | fl_num
--------+---------------------------------+------------+---------+---------------------------------+--------
MSY | 2012-01-01 05:57:00.000000+0000 | 19977 | UA | 2012-01-01 00:00:00.000000+0000 | 275
MSY | 2012-01-01 06:01:00.000000+0000 | 20409 | B6 | 2012-01-01 00:00:00.000000+0000 | 110
MSY | 2012-01-01 06:13:00.000000+0000 | 19790 | DL | 2012-01-01 00:00:00.000000+0000 | 551
MSY | 2012-01-01 06:45:00.000000+0000 | 19805 | AA | 2012-01-01 00:00:00.000000+0000 | 1190
MSY | 2012-01-01 06:46:00.000000+0000 | 19977 | UA | 2012-01-01 00:00:00.000000+0000 | 1184
The following statement returns no data:
cqlsh:myflights> select * from flights_by_airport2 where origin = 'MSY';
origin | dep_time | airline_id | carrier | fl_date | fl_num
--------+----------+------------+---------+---------+--------
(0 rows)
I have a single-node Cassandra and Spark (DSE 6 cluster) installed on a Ubuntu VM.
As my comment has resulted in the answer I'm going to post it as the answer:
I would first check if it is a whitespace issue by running
select * from flights_by_airport2 where origin contains 'MSY';
If that remains an issue then you could try using trim
, or just clean the data.