Search code examples
androiddatabaseperformanceandroid-sqlitedatabase-performance

Improving Android SQLite query performance for relational tables


Scenario:

I am working with a what I think is a fairly large SQLite database (around 20 MB) in my Android app, which consists of around 50 tables.

Most of these tables are linked by foreign keys, and a lot of the time, I need to retrieve information from two or more tables at a time. To illustrate an example:

Table1:

Id  |  Name  |  Attribute1  |  Attribute2  |  ForeignKey

1   |  "Me"  |  SomeValue   |  AnotherVal  |     49
2   |  "A"   |     ...      |     ...      |     50
3   |  "B"   |              |              |     49

Table2:

Id  |  Attribute3  |  Attribute4  |  Attribute5

49  |   ThirdVal   |  FourthVal   |   FifthVal
50  |     ...      |     ...      |     ...

Sometimes, there are more than two tables that link together in this way. Almost all of the time, there are more columns than those presented above, and there are usually around 1000 rows.

My aim is to display a few of the attributes from the database as items in a RecyclerView, but I will need to use both tables to retrieve these attributes.


My method:

Currently, I am using the android-sqlite-asset-helper library to copy this database (.db extension) from the assets folder into the app. When I recorded the time for this copying to happen, it completed in 732 ms, which is fine.

However, when I want to retrieve the data from two tables using the foreign key from the first table, it takes far too long. It took around 11.47 seconds when I tested this, and I want to speed this up.

The way in which I retrieve the data is that I read each row in the first table, and put it into an object:

public static ArrayList<FirstItem> retrieveFirstItemList(Context context) {
    Cursor cursor = new DbHelper(context).getReadableDatabase()
            .query(DbHelper.TABLE_NAME, null, null, null, null, null, null);
    ArrayList<FirstItem> arrayList = new ArrayList<>();
    cursor.moveToFirst();
    while (!cursor.isAfterLast()) {
        // I read all the values from each column and put them into variables
        arrayList.add(new FirstItem(id, name, attribute1, attribute2, foreignKey));
        cursor.moveToNext();
    }
    cursor.close();
    return arrayList;
}

The FirstItem object would contain getter methods in addition to another used for getting the SecondItem object from the foreign key:

public SecondItem getSecondItem(Context context) {
    Cursor cursor = new SecondDbHelper(context).getReadableDatabase().query(
            SecondDbHelper.TABLE_NAME,
            null,
            SecondDbHelper.COL_ID + "=?",
            new String[] {String.valueOf(mForeignKey)},
            null, null, null);
    cursor.moveToFirst();
    SecondItem secondItem = new SecondItem(mForeignKey, attribute3, attribute4, attribute5);
    cursor.close();
    return secondItem;
}

When I print values from both tables into the logcat (I have decided not to use any UI for now, to test database performance), I use something like this:

for (FirstItem firstItem : DBUtils.retrieveFirstItemList(this)) {
    Log.d("First item id", firstItem.getId());
    Log.d("Second item attr4", firstItem.getSecondItem(this).getAttribute4());
}

I suspect there is something wrong with this method as it needs to search through Table2 for each row in Table1 - I think it's inefficient.


An idea:

I have one other method I am considering using, however I do not know if it is better than my current solution, or if it is the 'proper' way to achieve what I want. What I mean by this is that I am unsure as to whether there is a way I could slightly modify my current solution to significantly increase performance. Nevertheless, here is my idea to improve the speeds of reading data from the database.

When the app loads for the first time, data from various tables of the SQLite database would be read then put into one SQLite database in the app. This process would occur when the app is run for the first time and each time the tables from the database are updated. I am aware that this would result in duplication of data across different rows, but it is the only way I see that would avoid me having to search multiple tables to produce a list of items.

// read values from SQLite database and put them in arrays

ContentValues cv = new ContentValues();

// put values into variables

cv.put(COL_ID, id);
...
db.insert(TABLE_NAME, null, values);

Since this process would also take a long time (as there are multiple rows), I was a little concerned that this would not be the best idea, however I read about transactions in some Stack Overflow answers, which would increase write speeds. In other words, I would use db.beginTransaction();, db.setTransactionSuccessful(); and db.endTransaction(); appropriately to increase the performance when rewriting the data to a new SQLite database.

So the new table would look like this:

Id  |  Name  |  Attribute1  |  Attribute2  |  Attribute3  |  Attribute4  | Attribute5

1   |  "Me"  |  SomeValue   |  AnotherVal  |   ThirdVal   |   FourthVal  |  FifthVal
2   |  "A"   |     ...      |     ...      |     ...      |     ...      |     ...
3   |  "B"   |  SomeValue   |  AnotherVal  |   ThirdVal   |   FourthVal  |  FifthVal

This means that although there would be more columns in the table, I would avoid having to search through multiple tables for each row in the first table, and the data would be more easily accessible too (for filtering and things like that). Most of the 'loading' would be done at the start, and hopefully sped up with methods for transactions.


Overview:

To summarise, I want to speed up reading from an SQLite database with multiple tables, where I have to look through these tables for each row of the first table in order to produce the desired result. This takes a long time, and is inefficient, but I'm not sure if there is a way I can adjust my current method to greatly improve read speeds. I think I should 'load' the data when the app is first run, by reorganising the data from various tables into one table.

So I am asking, which of the two methods is better (mostly concerning performance)? Is there a way I can adjust my current method or is there something I am doing incorrectly? Finally, if there is a better way to do this than the two methods I have already mentioned, what is it and how would I go about implementing it?


Solution

  • A couple of things that you should try:

    • Optimise your loading. As far as I understood your current method, it runs into the N + 1 queries problem. You have to execute a query to get the first batch of data, and then another query for every row of the original result set, so you can fetch the related data. It's normal that you get a performance problem with that approach. I don't think it's scalable and I would recommend you move away from it. The easiest way is to use joins instead of multiple queries. This is referred to as eager loading.
    • Introduce appropriate indexes on your tables. If you are performing a lot of joins, you should really think about speeding them up. Indexes are the obvious choice here. Normally, primary key columns are indexed by default, but foreign keys are not. This means that you perform linear searches on the your tables for each join, and this is slow. I would try and introduce indexes on your foreign key columns (and all columns that are used in joins). Try to measure the performance of a join before and after to see if you have made any progress there.
    • Consider using database views. They are quite useful when you have to perform joins often. When creating a view, you get a precompiled query and save quite a bit of time compared to running the join each time. You can try executing the query using joins and against a view and this will show how much time you will save. The downside of this is that it is a bit harder to map your result set to a hierarchy of Java objects, but, at least in my experience, the performance gain is worth.
    • You can try and use some kind of lazy loading. Defer loading the related data unless it is being explicitly requested. This can be hard to implement, and I think that it should be your last resort, but it's an option nevertheless. You may get creative and leverage dynamic proxies or something like this to actually perform the loading logic.

    To summarise, being smart with indexes / views should do the trick most of the time. Combine this with eager / lazy loading, and you should be able to get to the point where you are happy with your performance.

    EDIT: Info on Indexes, Views and Android Implementation

    Indexes and Views are not alternatives to the same problem. They have different characteristics and application.

    When you apply an Index to a column, you speed up the search on those column's values. You can think of it as a linear search vs. a tree search comparison. This speeds up join, because the database already knows which rows correspond to the foreign key value in question. They have a beneficial effect on simple select statements as well, not only ones using joins, since they also speed up the execution of where clause criteria. They come with a catch, though. Indexes speed up the queries, but they slow down insert, update and delete operations (since the indexes have to maintained as well).

    Views are just precompiled and stored queries, whose result sets you can query just like a normal table. The gain here is that you don't need to compile and validate the query each time.

    You should not limit yourself to just one of the two things. They are not mutually exclusive and can give you optimal results when combined.

    As far as Android implementation goes, there is not much to do. SQLite supports both indexes and queries out of the box. The only thing you have to do is create them. The easiest way is to modify your database creation script to include CREATE INDEX and CREATE VIEW statements. You can combine the creation of a table with the creation of a index, or you can add it later manually, if you need to update an already existing schema. Just check the SQLite manual for the appropriate syntax.