I am designing a system that takes student's modules and groups them together in classes based on their department and subjects chosen.
Let's for example take the following table:
Student ID | Department | Choice 1 | Choice 2 | Choice 3 | Choice 4 |
---|---|---|---|---|---|
1 | Languages | English | Literature | Spanish | Italian |
2 | Languages | English | Literature | Spanish | Italian |
3 | Languages | English | Literature | Spanish | German |
4 | Languages | English | Literature | Spanish | French |
5 | Science | Math | Physics | Chemistry | Biology |
6 | Science | Math | Physics | Chemistry | Computing |
Let's say that each class can have 2 students.
With the above example 3 classes need to be created (1,2), (3,4) and (5,6). (Note: students grouped together must be on the same department)
Can you give me hints how to design the database so it can allow for a number of groups/cluster to be created. The application gonna be really big with a lot of entries.
I thought of using a Vector Database for this but I am not sure if it is the best option
I would appreciate any helpful hints, links or books
As you say you will have a lot of data and you might need vector capabilities, then Cassandra would make sense. You could start now on the free tier of AstraDB which is a managed cassandra service.
The denormalized data-model would have 3 tables and look like this:
First table to read student information
CREATE TABLE IF NOT EXISTS student_by_ID (
Student_ID UUID PRIMARY KEY,
Department TEXT,
Choices LIST<TEXT>
);
Note if the number of choice will always be 4, then create 4 columns (choice1, choice2...)
Second table would allow you to retrieve a classe per id. Adding student_ID
as a clustering column allow you to retrieve easly all the students for a given class.
CREATE TABLE IF NOT EXISTS classe_by_ID (
Class_ID UUID,
Student_ID UUID,
Department TEXT,
...
PRIMARY KEY (Class_ID, Student_ID)
);
Then we could have a last table for departments
CREATE TABLE IF NOT EXISTS Department_by_ID (
Department_Name TEXT,
Class_ID UUID,
PRIMARY KEY (Department_Name, Class_ID)
);
Astra has vector search capabilities (that will be in Cassandra 5.0) so you can add vector
type attribute to your tables to perform similarity search on it. (check this video)