design-patterns cassandra database-partitioning

Suggestion required to design partitioning in cassandra

I have to design the database for customers having prices for millions of materials they acquire through multiple suppliers for the next 24 months. So the database will store prices on a daily basis for every material supplied by a specific supplier for the next 24 months. So we keep past data. Now lookups will happen on:

Find the price for a material by the supplier as of the specific date by a customer.
Find the price for a material by the supplier for a time period by a customer.

I can think for the primary key as:

Partition key: (customer Id, material Id, supplier Id, date) -- will this end up in perf issues as it will make so many partitions in long run?
Partition Key: (customer Id, material Id, supplier Id, monthbucket), clustering key: date -- monthbucket will store data for material on monthly basis on same partition and will have value like '202002' for Feb 2020 dates.

Another question is how can I make sure my data distributes evenly across nodes.

Note:

combination of customer, material, supplier and date is unique.
two customer can have similar material ids.

Key Points: 1. Some customers can have a very small data set while others can have huge data. How well we can distribute data evenly across partitions as Date is a constant field for all customers. Also, material Ids can be the same among different customers as that is there internal representation (maybe a numeric or alphanumeric)

Number of Suppliers per customer and material id varies from 1 - 20 in number. do you have any suggestions or questions?

Thanks.

Solution

It depends on how many suppliers do you have per customer. Because you always have queries on the customer ID & material ID, then I suggest that at least that columns are making into partition key. If you have too many suppliers, you can move it into partition key as well. And I would avoid having monthly bucket - it will make querying hard.

So you can go with following primary keys:

((customer, material, supplier), date)
((customer, material) supplier, date)

both will allow to have both queries:

select * from table where customer = ... and material = ... and supplier = ... and date = ...
select * from table where customer = ... and material = ... and supplier = ... and date >= start and date <= end

but I would recommend to go with 1st one the partitions won't be too big, and not too small.