When attempting to perform this query:
select race_name from sport_app.month_category_runner where race_type = 'URBAN RACE 10K' and club = 'CORNELLA ATLETIC';
I get the following error:
Cannot execute this query as it might involve data filtering and thus may have unpredictable performance. If you want to execute this query despite the performance unpredictability, use ALLOW FILTERING
It is an exercise, so I am not allowed to use ALLOW FILTERING.
So I have created two indexes in this way:
create index raceTypeIndex ON sport_app.month_category_runner(race_type);
create index clubIndex ON sport_app.month_category_runner(club);
But I keep getting the same error, am I missing something, or is there an alternative?
Table Structure:
CREATE TABLE month_category_runner (month text,
category text,
runner_id text,
club text,
race_name text,
race_type text,
race_date timestamp,
total_runners int,
net_time time,
PRIMARY KEY (month, category, runner_id, race_name, net_time));
Note if you add the "ALLOW FILTERING" the query will run on all the nodes of Cassandra cluster and can have a large impact on all nodes.
The recommendation is to add the partition as condition of your query, to allow the query to be executed on needed nodes only.
Example:
select race_name from month_category_runner where month = 'may' and club = 'CORNELLA ATLETIC';
select race_name from month_category_runner where month = 'may' and race_type = 'URBAN RACE 10K';
select race_name from month_category_runner where month = 'may' and race_type = 'URBAN RACE 10K' and club = 'CORNELLA ATLETIC' ALLOW FILTERING;
Your primary key is composed by (month, category, runner_id, race_name, net_time) and the column month is the partition, so this column must be on your query filter as i showed in example.
The query that you want to do using two columns that are not in primary key despite the index column exist, you need to use the ALLOW FILTERING that can have performance impact;
The other option is create a new table where the primary key contains theses columns.