Search code examples
sqlcassandracqlcqlsh

Updating a column in a table based on its value from another table on id match


I have two tables:

activities (
    org_id bigint,
    user_id bigint,
    lang text,
    timestamp bigint,
    ...
    PRIMARY KEY ((org_id, user_id), timestamp)

and

metadata (
    user_id text,
    org_id text,
    lang text,
    date_str text,
    ...
    PRIMARY KEY ((user_id, org_id), date_str)

Given: activities.rows >> metadata.rows

I need to

update metadata and set metadata.lang = activities.lang for all rows in metadata,
where metadata.user_id = activities.user_id and metadata.org_id = activities.org_id

Q: What would be an elegant short cql query to achieve the same?

I tried:

update metadata set metadata.lang = (select activities.lang from
activities where activities.user_id = metadata.user_id and activities.org_id = metadata.org_id)

which obviously fails for cardinality violations.


Solution

  • CQL doesn't support this kind of update statements (see the docs). You need to provide actual values in the where condition.

    So you'll need to do everything in your code (fetch data, generate update statements, etc.), or you can use something like Spark, with something like this (not tested):

    metadata = spark.read\
        .format("org.apache.spark.sql.cassandra")\
        .options(table="metadata", keyspace="test")\
        .load()\
        .select("user_id", "org_id")
    activities = spark.read\
        .format("org.apache.spark.sql.cassandra")\
        .options(table="activities", keyspace="test")\
        .load()\
        .select("user_id", "org_id", "lang")
    joined = activities.join(metadata, 
       (metadata["user_id"] == activitites["user_id"]) &
       (metadata["org_id"] == activities["org_id"]), "inner")\
       .select(metadata["user_id"], metadata["org_id"], "lang")
    joined.write\
        .format("org.apache.spark.sql.cassandra")\
        .mode('append')\
        .options(table="metadata", keyspace="test")\
        .save()