Search code examples
c++databaseperformancesqlitequery-optimization

Reduce the time needed for a database to be read


I'm writing a program in which I have to read a huge amount of data from a database (Sqlite) and then display the obtained values in a window. I need to refresh the windows each second, so the operation of reading the all data need to last less then a second.

The database is composed by two columns a primary key and a value I want to display (a number), the primary key is simply a progressive number that goes from 1 to the number of rows. The primary key is not important for me as far the retrived data are sorted in a vector/struct.

The database has about 8*10^7 rows and it's size on the disk is about 250MB.

I start with the idea that with transaction I can reduce the time needed to read the data, I try to implement a small program and I measure the time, it was about 20 sec. Then I understand that transaction are not very effective with select statement (Here)

So I try with:

using Record = std::vector<std::string>;
using Records = std::vector<Record>;

int select_callback(void *p_data, int num_fields, char **p_fields, char **p_col_names)
{
  Records* records = static_cast<Records*>(p_data);
  try {
    records->emplace_back(p_fields, p_fields + num_fields);
  }
  catch (...) {
    // abort select on failure, don't let exception propogate thru sqlite3 call-stack
    return 1;
  }
  return 0;
}

Records select_stmt(const char* stmt)
{
  Records records;  
  char *errmsg;
  int ret = sqlite3_exec(db, stmt, select_callback, &records, &errmsg);
  if (ret != SQLITE_OK) {
    std::cerr << "Error in select statement " << stmt << "[" << errmsg << "]\n";
  }
  else {
    std::cerr << records.size() << " records returned.\n";
  }

  return records;
}

Records records = select_stmt("SELECT * FROM TABLE");

I have fastest results (about 14sec) but It was not enough.

Is it possible to reach such reading time (less than 1 sec)?

OS: Ubuntu 20.10 Language: C++


The code I used to get the data with prepare and transaction statement

std::string msg = "SELECT ZPOS FROM TABLE WHERE ID = ?";

const char* msg_char_ptr = msg.c_str();

memset(sSQL, '\0', BUFFER_SIZE);

sprintf(sSQL, msg_char_ptr);

int rc = sqlite3_prepare_v2(db,  sSQL, BUFFER_SIZE, &stmt, &tail);

int count = 0;

if(rc == SQLITE_OK ) {

    sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, &sErrMsg);

    for (int i = 0; i < number_of_rows; i++)
    {

        // indices vector with all the indices of the table
        sqlite3_bind_int(stmt, 1, indices[i]); 

        while ( sqlite3_step( stmt ) == SQLITE_ROW ) { 

            int result = sqlite3_column_int( stmt, 0 );
            
            // collectData vector with the result
            collectData[count] = result;

            int val = sqlite3_column_count( stmt );

            count ++;

        }

        sqlite3_step(stmt);
        sqlite3_clear_bindings(stmt);
        sqlite3_reset(stmt);
    }

} else {

    printf("SQLite prepare error.\n");

}       

sqlite3_exec(db, "END TRANSACTION", NULL, NULL, &sErrMsg);
sqlite3_finalize(stmt);

Solution

  • If you must SELECT many rows with SQL, your best bet is to do it with one SQL SELECT statement to get all the rows you want, in a multi-row result set. Instead of issuing one SELECT for each ID value you want, issue SELECT ID, ZPOS FROM TABLE` and set up your code to read multiple rows.

    That being said, I guess you're trying to read 80 megarows (250MiB) from your table into RAM in your app once a second. That's too much data to read every second. Even if you wrote your C++ code to simply open a flat file and read 250MiB, doing it once a second would be too often.

    Some design points to keep in mind:

    1. There's no magic in SQL that makes it super fast, faster than file-reading, to access mass quantities of data.

    2. The point of SQL is to allow programs like yours to handle very large sets of data a few rows at a time. You certainly can do stuff like

      SELECT ID, ZPOS FROM TABLE WHERE ID > ?
      

      where the value you put in ? with binding is the largest ID value you received last time you read the database. That gets you the new records.

    3. With respect, you need to design your program to cache the data from your table in RAM and use the cache effectively. Reloading all the data once a second is too often.