Search code examples
database-designnosqlcassandracolumn-family

NoSQL (cassandra) data model for my project


I need to have your opinion on my design of data model for my project. My project is real-time recommendation system. There are some collections of recommendation algorithms. it means something like this:

collection1 { algorithm1, algorithm5, algorithm6 }

collection2 { algorithm5, algorithm6, algorithm7, algorithm8 }

etc.

and I need to store data like - success, probability of choosing, score and so on to every algorithm (every 2 minutes) in collection. So I have chosen Cassandra like my data storage because it works fine with time series. I need to have my data stored to be displayed later in some graphs and charts. Do you think that my solution of data model is ok? I made it in this way:

CREATE TABLE algorithm_by_collection_and_date (
   algorithm_id text,
   collection_id text,
   date text,
   event_time timestamp,
   score double,
   probability double,
   PRIMARY KEY ((algorithm_id,collection_id,date),event_time)
);

So it is designed like row partitioning by adding data to the row key to limit the amount of columns (by date) I per algorithm in collection.

What do you think about this? Thanks, Jan


Solution

  • I would have this structure - it will allow you to normalise your structure and make it a cleaner design. I have rushed this, please add in the correct data types for the columns and also the referential integrity constraints.

    CREATE TABLE algorithm
    (
        algorithmId uuid PRIMARY KEY,
        algorithmName text
    )
    
    CREATE TABLE collection
    (
        collectionID uuid PRIMARY KEY,
        collectionName text
    )
    
    CREATE TABLE algo_collection
    (
        algoCollectionID uuid PRIMARY KEY
        collectionID
        algorithmID
    )
    
    CREATE TABLE recommendation
    (
        algoCollectionID 
        date    
        event_time,
        score,
        probability
    
    )