Search code examples
spring-bootcassandra

Is Cassandra Queuing More Efficient When Queuing by a Single Component of the Compound Partition Key?


I have a problem in Cassandra that I would like to query data by a column that has high cardinality. Now, in order to have a more balanced database, the partition key needs to be chosen in a way that partitions have more equal sizes. But for me, it is valuable to be able to query by a column that has high cardinality on one hand, but on the other hand, store data efficiently between the partitions.

Let's consider the scenario where I want to store alarms for a stock price. I have this table:

CREATE TABLE drop_price_alarm (
        userId bigint,
        symbol text,
        price decimal,
        PRIMARY KEY ((symbol, userId), price)
        );

In my understanding, I would be able to efficiently query all records that have a stock price bigger than some value, and also making the (symbol, userId) as a compound partition key would ensure that the records are distributed reasonably between the partitions. As some popular stocks like Apple will have more records, and still, I have the order preserved by the price, so Cassandra would be able to easily retrieve all alarms bigger than a certain stock price.

The problem comes when I run this in Java; I get this error:

Caused by: org.springframework.data.cassandra.CassandraInvalidQueryException: Query; CQL [SELECT * FROM droppricealarm WHERE symbol=? AND price<=?]; Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING

This is my table:

@Table
public class DropPriceAlarm {

    @PrimaryKeyColumn(
            name = "symbol", ordinal = 1, type = PrimaryKeyType.PARTITIONED)
    private String symbol;
    @PrimaryKeyColumn(
            name = "userId", ordinal = 2, type = PrimaryKeyType.PARTITIONED)
    private long userId;


    @PrimaryKeyColumn(
            name = "price", ordinal = 1, type = PrimaryKeyType.CLUSTERED)
    private BigDecimal price;

    public DropPriceAlarm(String symbol, long userId, BigDecimal price) {
        this.symbol = symbol;
        this.userId = userId;
        this.price = price;
    }

    public long getUserId() {
        return userId;
    }

    public void setUserId(long userId) {
        this.userId = userId;
    }

    public String getSymbol() {
        return symbol;
    }

    public void setSymbol(String symbol) {
        this.symbol = symbol;
    }

    public BigDecimal getPrice() {
        return price;
    }

    public void setPrice(BigDecimal price) {
        this.price = price;
    }

}

And this is my repository:

    List<DropPriceAlarm> findBySymbolAndUserIdAndPriceLessThanEqual(String symbol, long userId, BigDecimal price);
}

Why would i get this error, and what is not correct with my reasoning?


Solution

  • The issue is that the partition key is not being included within the query. The partitioning mechanism does distribute the data to the nodes based on a consistent hash mechanism (murmur3 by default).

    To retrieve the data, by default you should be providing the partition key since that is what is required to know where to obtain the data from.

    The clustering order, based on price in your example, only orders within a partition, it is not a global sort order of any kind.

    Given you wish to retrieve data based on a price, you have three options:

    • using an SAI index (C* 5.0 / Datastax Astra), but this will be an expensive query since it is a scatter-gather query which can ask every node to retrieve the data and then order on the co-ordinator.
    • using spark-sql which will parallel query all the token ranges and pull the data to spark to order, this would not be a fast / low-latency operation.
    • remodel the data to match the query patterns. (not enough information on the patterns is within the question.)