Search code examples
cassandrascyllacassandra-stress

ScyllaDB schema causes issues when imported with cassandra-stress


I'm currently using ScyllaDB in my environment and, due to technical reasons, researching moving to Cassandra. I'm trying to make cassandra-stress load up Cassandra cluster with data using the schema possibly identical to the one currently being used in ScyllaDB. Sadly, there are some issues.

The environment:

  • ScyllaDB 3.0.7 (= Cassandra 3.0.8) running on Ubuntu 18.04
  • Cassandra 3.11.4 running on Ubuntu 18.04
  • cassandra-stress 3.0.18 (part of cassandra-tools pkg) running on Ubuntu 18.04

The process is as follows:

  • dump the schema from ScyllaDB (desc keyspace_name)
  • prepare the cassandra-stress yaml file - one keyspace, five tables total
  • run cassandra-stress (cassandra-stress user profile=schema.yml cl=QUORUM duration=30s 'ops(insert=1)' -node 172.19.11.9 -rate threads=1)

Just to be sure there are no keyspace related issues, every run of cassandra-stress is done on a new keyspace (I'm incrementing the name).

Now when the schema is 1:1 as the one dumped from Scylla, definition of two tables (and only those two) causes the stress-tool to fail: com.datastax.driver.core.exceptions.SyntaxError: line 1:35 no viable alternative at input 'WHERE' (UPDATE "activities_bp_action" SET [WHERE]...).

The table definitions are as follows:

table: activities_bp
table_definition: |
  CREATE TABLE activities_bp  (
    business_profile_id int,
    create_date timestamp,
    event_uuid uuid,
    PRIMARY KEY (business_profile_id, create_date, event_uuid)
  ) WITH CLUSTERING ORDER BY (create_date DESC, event_uuid ASC)
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.DeflateCompressor'}
table: activities_bp_action
table_definition: |
  CREATE TABLE activities_bp_action  (
    business_profile_id int,
    action text,
    create_date timestamp,
    event_uuid uuid,
    PRIMARY KEY ((business_profile_id, action), create_date, event_uuid)
  ) WITH CLUSTERING ORDER BY (create_date DESC, event_uuid ASC)
    AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.DeflateCompressor'}

If the two lines containing PRIMARY KEY and CLUSTERING ORDER are replaced with what follows, the cassandra-stress runs fine with no errors and starts to fill up the cluster with data. However, the definitions now have drifted from the ones from ScyllaDB:

    PRIMARY KEY (event_uuid, create_date)
  ) WITH CLUSTERING ORDER BY (create_date DESC)

Now after cassandra-stress is run with the modified definition, I can roll back to the unmodified one (the one that used to fail). If run on an already existing keyspace, the yaml works fine now and fills up the cluster with data. That would suggest that the problem occurs while creating tables?

I was not able to find the full query that cassandra-stress displays in its stack-trace, both when running cassandra-stress and Cassandra in debug modes, and the query puzzles me a little bit.

Any ideas why the problem occurs? Thanks!

edit:

Attaching schema.yml: https://gist.github.com/schybbkoh/76cdbf19a2bb933419063526ff5ac44f

edit:

As it turns out, the "runs fine with no errors and starts to fill up the cluster with data" schema creates and fills with data only the last table defined in the schema. Something's wrong here.


Solution

  • All right, problem solved. There were two issues: