Search code examples
database-designduplicate-data

Having duplicate records on a Database, is always wrong?


Context: The users create challenges for some subject, level, in one city, a certain day at a certain time.

The user may have more of a challenge of the same subject, level, in the same city, same day and same time (challenges being played simultaneously).

Challenges can be created one at a time.

On my actual table, I have duplicate records for this situation (with different primary key value). Is this wrong? How should it be?

Table challenges:

id_challenge  INT (pk auto increment)
id_subject INT
id_level INT
id_city INT
id_user INT
date DATE
time TIME
comment VARCHAR(100)

Solution

  • Technically, I do not believe these records are "duplicate" per se, as they are referring to two distinct entities - even if the comments are the same. I am of course referring to the case where challenges #79 and #80 are treated separately, even if they have the same data in them. That is, on a list of a user's challenges they are participating in, they could list both challenges as two separate lines. The same metaphor extends to a list of challenges in a city.

    If, however, you find yourself grouping, counting, or sorting on these rows to remove duplicates, this is a smell. Even if the UI is only allowing you to create one at a time, each new combination would be more of a "profile" which can be invoked multiple times. Either storing a quantity or having a parent "challenge profile" table may be a better option.

    Additionally, if the number of "duplicate" rows is in the millions for each given combination, you may see some size benefit in forcing the issue, declaring them duplicate, and normalizing. But even then, storage is cheap and with a table this small a million records is only a few MB.


    But...IMHO the fact that users are creating multiple rows one-by-one in this fashion may indicate something that needs to be redesigned - possibly both in the front and back ends.