I know (from the answers of this question) that Sqlite by default doesn't enable compression. Is it possible to enable it, or would this require another tool? Here is the situation:
I need to add millions of rows in a Sqlite database. The table contains a description
column (~500 char on average), and on average, each description
is shared by, say, 40 rows, like this:
id name othercolumn description
-------------------------------------------------
1 azefds ... This description will be the same for probably 40 rows
2 tsdyug ... This description will be the same for probably 40 rows
...
40 wxcqds ... This description will be the same for probably 40 rows
41 azeyui ... This one is unique
42 uiuotr ... This one will be shared by 60 rows
43 poipud ... This one will be shared by 60 rows
...
101 iuotyp ... This one will be shared by 60 rows
102 blaxwx ... Same description for the next 10 rows
103 sdhfjk ... Same description for the next 10 rows
...
Question:
Would you just insert rows like this, and enable a compression algorithm of the DB?
Pros:
or
Would you use 2 tables?
id name othercolumn descriptionid
-------------------------------------------------
1 azefds ... 1
2 tsdyug ... 1
...
40 wxcqds ... 1
41 azeyui ... 2
...
id description
-------------------------------------------------
1 This description will be the same for probably 40 rows
2 This one is unique
Cons:
instead of the simple select id, name, description from mytable
from solution #1, we have to use 2 tables, and multiple queries to retrieve data. Or maybe is it possible to do it without a complex query, but with a clever query with union
or merge
or anything like this?
also, if some description
are no longer used, we have to periodically clean up the 2nd table to remove obsolete entries
Using multiple tables will not only prevent inconsistency, and take less space, but may also be faster, even if multiple/more complex queries are involved (precisely because it involves moving less data around). Which you should use depends on which of those characteristics are most important to you.
A query to retrieve the results when you have 2 tables would look something like this (which is really just a join between the two tables):
select table1.id, table1.name, table1.othercolumn, table2.description
from table1, table2
where table1.descriptionid=table2.id