Search code examples
mysqlsqldatabase-designentity-relationship

Need correct database structure to reduce the size


I want to design my database correctly. Maybe someone could help me with that.

I have a device which writes every 3s around 100 keys/values to a table. Someone suggested to store it like this:

^ timestamp ^ key1 ^ key2 ^ [...] ^ key150 ^

| 12/06/12 | null | 2243466 | [...] | null ^

But I think thats completely wrong and not dynamic. Because I could have many null values. So I tried to do my best and designed it how I learned it at school: http://ondras.zarovi.cz/sql/demo/?keyword=tempidi

Here is the problem that I write for every value the timestamp which means within 100values it would be always the same and produce large amount of data.

Could someone give a me hint how to reduce the database size? Am I basically correct with my ERM?


Solution

  • If you must implement a key-value store in MySQL, it doesn't make any sense to make it more complicated than this.

    create table key_value_store (
      run_time datetime not null,
      key_name varchar(15) not null,
      key_value varchar(15) not null,
      primary key (run_time, key_name)
    );
    

    If the average length of both your keys and values is 10 bytes, you're looking at about 86 million rows and 2.5GB per month, and you don't need any joins. If all your values (column key_value) are either integers or floats, you can change the data type and reduce space a little more.

    One of the main problems with implementing key-value stores in SQL is that, unless all values are the same data type, you have to use something like varchar(n) for all values. You lose type safety and declarative constraints. (You can't check that the value for key3 is between 1 and 15, while the value for key7 is between 0 and 3.)


    Is this feasible?

    This kind of structure (known as "EAV"--Google that) is a well-known table design anti-pattern. Part of the problem is that you're essentially storing columns as rows. (You're storing column names in key_value_store.key_name.) If you ever have to write out data in the format of a normal table, you'll discover three things.

    1. It's hard to write queries to output the right format.
    2. It takes forever to run. If you have to write hundreds of columns, it might never run to completion.
    3. You'll wish you had much faster hardware. Much, much faster hardware.

    What I look for

    • Opportunities to group keys into logical tables. This has to do with the first design, and it might not apply to you. It sounds like your application is basically storing a log file, and you don't know which keys will have values on each run.
    • Opportunities to reduce the number of rows. I'd ask, "Can we write less often?" So I'd be looking at writing to the database every 5 or 6 seconds instead of every 3 seconds, assuming that means I'm writing fewer rows. (The real goal is fewer rows, not fewer writes.)
    • The right platform. PostgreSQL 9.2 might be a better choice for this. Version 9.2 has index-only scans, and it has an hstore module that implements a key-value store.

    Test before you decide

    If I were in your shoes, I'd build this table in both MySQL and PostgreSQL. I'd load each with about a million rows of random-ish data. Then I'd try some queries and reports on each. (Reports are important.) Measure the performance. Increase the load to 10 million rows, retune the server and the dbms, and run the same queries and reports again. Measure again.

    Repeat with 100 million rows. Quit when you're confident. Expect all this to take a couple of days.