Search code examples
database-designgroup-bymariadbdatabase-normalization

Database for Lottery Best Practice Design


I am starting design an database to hold lottery results for study, in my Country I can download all results in excel.

Since the excel is all in on spreadsheet, I willing do some normalization for import that data, my question is about if my design is right focusing in performance, and how to deal with Group By, I am not sure, but I think I am failing badly on last one.

My initial design is:

For The Results:

CREATE TABLE `Game_Results` (
    `Id` int UNSIGNED NOT NULL,
    `Date_Game` date NOT NULL,
    `Ball_01` tinyint UNSIGNED NOT NULL,
    `Ball_02` tinyint UNSIGNED NOT NULL,
    `Ball_03` tinyint UNSIGNED NOT NULL,
    `Ball_04` tinyint UNSIGNED NOT NULL,
    `Ball_05` tinyint UNSIGNED NOT NULL,
    `Ball_06` tinyint UNSIGNED NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;

For the Prize Data:

CREATE TABLE `Game_Prize` (
    `Game_Id` int UNSIGNED NOT NULL,
    `Total_Bets` decimal UNSIGNED NOT NULL,
    `Winners` smallint UNSIGNED NOT NULL DEFAULT '0',
    `Prize_Per_winner` decimal UNSIGNED NOT NULL DEFAULT '0'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;

Location of Winners:

CREATE TABLE `Prize_Location` (
    `Game_Id` int UNSIGNED NOT NULL,
    `State_Prize` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL,
    `City_Prize` varchar(255) COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=DYNAMIC;

This design is based on desired webpage, lets say:

One div With Last result, so query will be:

SELECT `Ball_01`, `Ball_02`, `Ball_03`, `Ball_04`, `Ball_05`, `Ball_06` FROM Game_Results WHERE Id IN (SELECT MAX(Id) FROM Game_Results)

And Current Id will be global in php / Javascript

Another div in same page with the info of prize, so query will be:

 SELECT `Total_Bets`, `Winners`,  `Prize_Per_winner` FROM Game_Prize Where Game_Id = Id (from php / javascript)

If the person click on Winners then an modal would open with location data.

This time I can do an SELECT on last table and list that, theres no problem till here.

But I will try build some custom filters, and then the problems come out, my filters would let people use the features:

  • Whats the Ball who come out mote times?
  • List ordered balls by how many time come out.
  • Whats the State where more people won?
  • How many people won in one specific city?
  • Etc..

And the problem with duplicate that come outs with the possibility of export the result in PDF or DOCX, I will search for any javascript plugin to convert JSON on those, at this point, I am willing remake same structure of original Excel (I know about JOINS), example:

GameID = 55, ... , State = A, City = A (1 Person)

, ... , State A, City = A (1 Person)

, ... , State A, City = A (1 Person)

GameID = 56, ... , State = C, City = H

GameID = 57, ... , , , (No Winners)

The original Excel have all in same line mescling the cells for same game. Before the equal symbol (GameID, State, City, etc...) are the first line of columns, after the equal symbol are the data on botton, so GameID is mescling on table, like GROUP BY.

So, how should I deal with the last table ('Prize_Location') since sometimes there 3 or 6 Winners?

I mean, sometime people from same city and state would won, in this case how to deal with this situation?

And offcourse, this is the best way to design this database?

PS.: Index, Primary Keys, etc... I will do after choose an optimal design. The focus would be performance since I would share this data on my blog, and allow people filter and query this data, in another words, I have no idea how many people should use it too at sametime.


Solution

  • SELECT `Ball_01`, `Ball_02`, `Ball_03`, `Ball_04`, `Ball_05`, `Ball_06`
        FROM Game_Results WHERE Id IN (SELECT MAX(Id) FROM Game_Results)
    

    is better done this way:

    SELECT `Ball_01`, `Ball_02`, `Ball_03`, `Ball_04`, `Ball_05`, `Ball_06`
        FROM Game_Results  ORDER BY Id DESC  LIMIT 1
    

    Be cautious about using a naked DECIMAL. Consider using some form of INT for whole numbers.

    Are the balls ordered or not?

    You should consider having 6 rows in a table dedicated to the balls instead of having 6 columns. This will simplify calculating stats on ball numbers.

    I don't see much need for normalizing Locations. Just have these in the prize table:

    state CHAR(2)     CHARACTER SET ascii,
    city  VARCHAR(40) CHARACTER SET ascii
    

    (255 and utf8mb4 are unnecessary).

    You can still discover multiple winners in a city via:

    SELECT city, state, COUNT(*)
        FROM prizes
        GROUP BY city, state
        HAVING COUNT(*) > 1