For an Android word game (with minSdkLevel=9
meaning SQLite version 3.6.22) -
I would like to deliver the dictionary as a prefilled SQLite table within the APK file (with the help of SQLiteAssetHelper).
In the SQLite database there will be just 1 table:
create table dict ( /* contains 700 000 unique words */
word text not null
);
My question please:
How to declare the table for the best performance and which kind of SQL-query to use?
(When checking if a word entered by player is present in the dict table or not - that will be the main usage of the SQLite database in the app).
Should I create index (is it possible to have index for text
columns at all)?
Or should I declare the word column as primary key?
Also, some SQLite for Android guides suggest to have an _id
column in each table (probably to enable fetching the last inserted record? - which I don't really need here). Should I maybe use
create table dict (
_id integer primary key,
word text unique not null
);
create index word_index on dict(word);
or will that be a waste of 4 x 700 000 bytes? (Or is it added as _rowid_
anyway?)
Quick answer: yes, you can create index on text column.
However for best performance, this may not be the best option. Because the index created by SQLite should be simply a b-tree (binary tree), which speed up the search by binary search. i.e. with 700k words, the binary search has to run about 20 intervals. But this could be fast enough, you need to test it to actually know the performance.
Some alternative methods would be to create multiple tables (buckets), e.g. create table as wordA, wordB, wordC etc. And use the first character to determine which table the word is put. This drops the size of each table to contains about 27k records. (of course each bucket is not of equal size)
By doing this, it reduces the interval used performing the binary search.
And actually you should use hash function to determine the bucket, which makes the size of each buckets more balanced and you can freely control the number of buckets.
And you have to actually fine tune to know what is the optimal bucket size.