I want to design my database correctly. Maybe someone could help me with that.
I have a device which writes every 3s around 100 keys/values to a table. Someone suggested to store it like this:
^ timestamp ^ key1 ^ key2 ^ [...] ^ key150 ^
| 12/06/12 | null | 2243466 | [...] | null ^
But I think thats completely wrong and not dynamic. Because I could have many null values. So I tried to do my best and designed it how I learned it at school: http://ondras.zarovi.cz/sql/demo/?keyword=tempidi
Here is the problem that I write for every value the timestamp which means within 100values it would be always the same and produce large amount of data.
Could someone give a me hint how to reduce the database size? Am I basically correct with my ERM?
If you must implement a key-value store in MySQL, it doesn't make any sense to make it more complicated than this.
create table key_value_store (
run_time datetime not null,
key_name varchar(15) not null,
key_value varchar(15) not null,
primary key (run_time, key_name)
);
If the average length of both your keys and values is 10 bytes, you're looking at about 86 million rows and 2.5GB per month, and you don't need any joins. If all your values (column key_value) are either integers or floats, you can change the data type and reduce space a little more.
One of the main problems with implementing key-value stores in SQL is that, unless all values are the same data type, you have to use something like varchar(n) for all values. You lose type safety and declarative constraints. (You can't check that the value for key3 is between 1 and 15, while the value for key7 is between 0 and 3.)
Is this feasible?
This kind of structure (known as "EAV"--Google that) is a well-known table design anti-pattern. Part of the problem is that you're essentially storing columns as rows. (You're storing column names in key_value_store.key_name.) If you ever have to write out data in the format of a normal table, you'll discover three things.
What I look for
Test before you decide
If I were in your shoes, I'd build this table in both MySQL and PostgreSQL. I'd load each with about a million rows of random-ish data. Then I'd try some queries and reports on each. (Reports are important.) Measure the performance. Increase the load to 10 million rows, retune the server and the dbms, and run the same queries and reports again. Measure again.
Repeat with 100 million rows. Quit when you're confident. Expect all this to take a couple of days.