Search code examples
datastax-enterprisespark-submit

Error when submit spark application in datastax enterprise


I got this bug when trying to submit app from master node:
dse -u abc -p abc spark-submit --confpark.cassandra.auth.username=abc --conf spark.cassandra.auth.password=abc --conf spark.debug.maxToStringFields=10000 --conf spark.executor.memory=4G app.py

Im using 3 dse analytics node, 1 datacenter, 4 core/16gb ram node and submit app from master node. When I go to check tasks/stages I saw this bug:

enter image description here

Does everybody have even seen this bug?


Solution

  • You have a problem with your application that writes data into your tables - either it deletes a lot of data, or (most probably) it inserts nulls as part of "normal" inserts - in this case the tombstones are generated, and if you have a lot of them, the query are starting to fail.

    What you can do:

    • stop inserting nulls as part of data. If you're using Spark to write data, maybe run job with --conf spark.cassandra.output.ignoreNulls=true - this will prevent writing nulls, but this may not work very well with overwriting existing data. If you're using other driver(s), use unset for fields that have null value;
    • don't delete data by individual columns, but delete by rows/ranges/partitions
    • expire tombstones much faster - if you can, maybe use lower gc_grace_period, but this comes with its own challenges - I recommend to read this article for better understanding of the problem.