Given are two tables in Cassandra:
xs (id bigint, x bigint)
0,10
1,11
2,12
3,13
4,14
ys (id bigint, y bigint)
0,100
2,144
4,196
In this minimal example I would like to get all rows from xs
having an id
not contained in ys
yet, i.e.
select
1,11
3,13
How can I achieve this?
Based on the selected rows I want to create new rows in ys
, basically synchronizing (addition only) ys
to xs
.
Here is my code to create the keyspaces/tables:
from cassandra.cluster import Cluster
cluster = Cluster()
session = cluster.connect()
session.execute('DROP KEYSPACE keyspace_x;')
session.execute('DROP KEYSPACE keyspace_y;')
session.execute('''
CREATE KEYSPACE keyspace_x
WITH replication = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};''');
session.execute('''
CREATE KEYSPACE keyspace_y
WITH replication = {
'class' : 'SimpleStrategy',
'replication_factor' : 1
};''');
session.execute('''CREATE TABLE keyspace_x.xs (
id bigint,
x bigint,
PRIMARY KEY (id)
);''');
session.execute('''CREATE TABLE keyspace_y.ys (
id bigint,
y bigint,
PRIMARY KEY (id)
);''');
session.execute('INSERT INTO keyspace_x.xs(id, x) VALUES (%s, %s)',(0, 10))
session.execute('INSERT INTO keyspace_x.xs(id, x) VALUES (%s, %s)',(1, 11))
session.execute('INSERT INTO keyspace_x.xs(id, x) VALUES (%s, %s)',(2, 12))
session.execute('INSERT INTO keyspace_x.xs(id, x) VALUES (%s, %s)',(3, 13))
session.execute('INSERT INTO keyspace_x.xs(id, x) VALUES (%s, %s)',(4, 14))
session.execute('INSERT INTO keyspace_y.ys(id, y) VALUES (%s, %s)',(0, 100))
session.execute('INSERT INTO keyspace_y.ys(id, y) VALUES (%s, %s)',(2, 144))
session.execute('INSERT INTO keyspace_y.ys(id, y) VALUES (%s, %s)',(4, 196))
In SQL I would use something like that
SELECT xs.id, xs.x FROM xs
LEFT JOIN ys ON xs.id = ys.id
WHERE ys.id IS NULL;
But what about CQL?
Currently I just fetch all IDs from both tables and do a set difference in my application, but I guess this is far from optimal.
There is no multi-table functionality. How implement LEFT or RIGHT JOIN using spark-cassandra-connector You can add relational layers onto Cassandra. How to do a join queries with 2 or more tables in cassandra cql You can fruitlessly search the manual. http://cassandra.apache.org/doc/latest/cql/dml.html Re browsing/googling: Other SQL ways to do what you want are EXCEPT
aka MINUS
& NOT IN
. Algebraic functionality is DIFFERENCE
aka MINUS
& ANTIJOIN
).