Search code examples
goelasticsearchcassandrastreamdata-stream

How to stream data with golang query to Cassandra


I have the following code:

cluster := gocql.NewCluster("our-cass")
cass, err := cluster.CreateSession()
defer cass.Close()
iter := cass.Query(`SELECT * FROM cmuser.users LIMIT 9999999999;`).Iter()
c :=iter.Columns()
scanArgs := make([]interface{}, len(c))

for i:=0; i < len(scanArgs); i++ {
    scanArgs[i] = makeType(c[i])
}

for iter.Scan(scanArgs...) { ... }

The problem is that we have way too many rows in that table. But I need to read all of them, to migrate the data to another db. Is there a way to stream the data from Cassandra? Unfortunately, we don't have a sequence for the primary key of the table, we are using a uuid for the PK. So that means we can't do a simple technique of 2 for loops, one incrementing a counter and going through all the rows that way.


Solution

  • Gocql has some options for paging (assuming your Cassandra version is at least version 2).

    Gocql's Session has a method SetPageSize

    And Gocql's Query has a similar method, PageSize

    This may help you break up your query. Here's what the code would look like:

    cluster := gocql.NewCluster("our-cass")
    cass, err := cluster.CreateSession()
    defer cass.Close()
    
    iter := cass.Query(`SELECT * FROM cmuser.users;`).PageSize(5000).Iter()
    
    // use the iter as usual to iterate over all results 
    // this will send additional CQL queries when it needs to get new pages