Search code examples
mysqlrelational-databasestoring-databigdatadatabase

How do databases handle redundant values?


Suppose I have a database with several columns. In each column there are lots of values that are often similar.

For example I can have a column with the name "Description" and a value could be "This is the description for the measurement". This description can occur up to 1000000 times in this column.

My question is not how I could optimize the design of this database but how a database handles such redundant values. Are these redundant values stored as effectively as with a perfect design (with respect to the total size of the database)? If so, how are the values compressed?


Solution

  • The only correct answer would be: depends on the database and the configuration. Because there is no silver bullet for this one. Some databases do only store values of each column once (some column stores or the like) but technically there is no necessity to do or not do this.

    In some databases you can let the DBMS propose optimizations and in such a case it could possibly propose an ENUM field that holds only existing values, which would reduce the string to an id that references the string. This "optimization" comes at a price, for example, when you want to add a new value in the field description you have to adapt the ENUM field.

    Depending on the actual use case those optimizations are worth nothing or are even a show stopper, for example when the data changes very often (inserts or updates). The dbms would spend more time managing uniqueness/duplicates than actually processing queries.

    On the question of compression: also depends on the configuration and the database system I guess, depends on the field type too. text data can be compressed and in the case of non-indexed text fields there should be almost no drawback in using a simple compression algorithm. Which algorithm depends on the dbms and configuration, I suspect.

    Unless you become more specific, there is no more specific answer, I believe.