Search code examples
cassandracassandra-3.0cqlsh

Is there a way to resume from where COPY TO failed?


I am using Cassandra's COPY command (docs can be found at https://docs.datastax.com/en/cql-oss/3.x/cql/cql_reference/cqlshCopy.html) to export a large table to CSV and have even larger was ones I need to export after this.

The command I used is:

COPY my_table_name TO 'my_table_name.csv' 

After running for 12 hours (and creating a 289GB file) I got the following error:

Error for (3598295844520231142, 3615644561192297385): ReadFailure - Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed - received 0 responses and 1 failures" info={'failures': 1, 'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} (permanently given up after 349000 rows and 1 attempts)
Exported 1280 ranges out of 1281 total ranges, some records might be missing

I would like to know if there is a way to continue from the point of failure. I see the error shows "Exported 1280 ranges out of 1281 total ranges" is there a way to specify just the last range so I don't need to try exporting the entire table again?

The docs mention BEGINTOKEN and ENDTOKEN but I'm not clear on what those mean if they can help me.

Is there perhaps a more robust method to export a table?


Solution

  • The COPY commands should be used for up to about 1 million records in a table (rule of thumb). The reason is it is pretty slow (single threaded python) and not very robust even if easy.

    There are FREE tools out there to achieve the same thing. I am thinking specially of DSBulk. To create a CSV use the unload command doc

    This component provides you with many options for the export and a CHECKPOINT mechanism to restart when you stopped. The output are multiple CSV more easy to move later on.

    dsbulk unload -url ~/data-export -k ks1 -t table1