Search code examples
database-designcloudkit

Database structure of quiz system


I'm building a quiz system and i'm not sure how to structure it on the database side.

Let's say the quiz system is for trees species. User will be presented a tree picture (an entry), and will have to select the correct species among a list of 6 (5 random, 1 correct). There are 250 species in the quiz.

The two main problems I need to solve:

1) User should never see again the same tree entry. 2) Owner of a tree entry, must be able to see stats of all the votes on his entry: so he will see number of votes, per tree species, entered by other users on his entry. Because users will make mistakes on votes, owner of entry will see stats of those mistakes and which incorrect species his entry was mistaken for.

An easy way to do it, would be to create several tables:

  • Users table
  • Trees table: With foreign key to User table. This is entries users will have to vote on. One user can create one entry.
  • Votes tables: With foreign key to both user table and tree table. This table will hold each and every vote, including tree species name selected by the user on that vote.

So to solve my problem:

1) Prevent user to see same entry again: Before fetching "Trees" to present new trees to vote on, run a first query to get all "Votes" for that user to exclude "Trees" the user has already voted on.

2) See stats of votes on own entry: Fetch all votes pointing to own entry, then compute stats by tree species selected.

This approach is straightforward, but what bothers me is having to create an entry for each and every vote. There can be lots of it. I'm inexperienced with db design, so not sure if this could be a problem or not.

Would this alternative way be better:

1) To keep track of already voted entries: On a single record (per user), hold in a single field, ALL voted ids, so instead of running a query to prefetch each votes entries, simply get them from a single field. Not sure if this is doable, and what is the limit size of a single field that could potentially hold thousands of votes which are in fact ids of entries voted on.

2) So owner of an entry can view that entry statistics: On EACH tree entry, create 250 columns, one per tree species, and increment an INT on each depending on what is voted on. So to get stats of each entry, it would only be a matter of fetching one single record and do the computation client side.

What would be the best approach? Is there one that should be avoided?


Solution

  • The best approach would definitely be the first one you've described - Having a table for users, a table for trees, and a table for votes.

    The other alternatives are terrible and unscalable - in that order:

    1. Having a single column store multiple data points is 99.999% of the time the wrong thing to do, and 99.999% of the time when someone decide to do it anyway, it's done in the most terrible way possible - using a delimited string.
      Every database have the proper data type to store multiple data points - it's called a table. Some databases also supports types like XML, Json or arrays, but these should be used carefully as most of the time, they are a worst alternative than adding another table.

    2. Having a table with 250 columns restricts your quiz to a maximum number of option - 250 to be exact.
      If you want to add even one more option, you have to add a column to it.
      Wide tables are useful when they store entities with many different properties, and even then they should be considered only after the other alternatives have been exhausted, because they are usually a maintenance nightmare.


    That being said, there are things to consider in way you should use the first, normalized design:

    1. User should never see again the same tree entry - An easy solution would be to select a random tree, where the tree id does not exist in the votes table with the user id that's getting the query - basically something like this (I use T-SQL for my example, you should use whatever SQL dialect supported by your RDBMS of choice, of course):
    SELECT TOP 1 Id, Picture, Name -- return only one record
    FROM Trees
    WHERE NOT EXIST 
    (
        SELECT 1
        FROM Votes
        WHERE Votes.TreeId = Trees.TreeId
        AND UserId = @UserId
    )
    ORDER BY NewID(); -- random order 
    
    1. See stats of votes on own entry - This can probably be accomplished using aggregating functions such as SUM, COUNT etc' on the database level, there's probably no need to fetch all the rows and calculate this data row by row.