Search code examples
cassandra

SUM() of large number of Cassandra rows returns negative value


I'm trying to get sum of column in cassandra but it return negetive value although there is no negetive row in my column.

cqlsh:samt> select cost from items where cost<0 allow filtering ;

 cost
------

(0 rows)

But when i try to query sum of column it return :

select sum(cost) from items;
system.sum(cost)
------------------
        -18485190

the number of rows I'm trying to aggregate in is more than a million, is this the cause?


Solution

  • Welcome to Stack Overflow! You haven't provided much detail so I can only guess that the column values are so large that when they are all added together, it results in an integer overflow.

    In any case, unbounded queries are expensive since they require a full table scan and can affect the performance of the cluster especially if run in production. If you need to run full table aggregation, you should run them in Spark.

    Finally for future reference, you need to provide sufficient background information when asking questions. The general guidance is that you (a) provide a good summary of the problem that includes software/component versions, the full error message + full stack trace; (b) describe what you've tried to fix the problem, details of investigation you've done; and (c) minimal sample code that replicates the problem. Cheers!