Search code examples
sqlitedbi

SQLite Query plan


Is there a way to manipulate the query plan generated in SQLite?

I 'l try to explain my problem:

I have 3 tables:

CREATE TABLE "index_term" (
  "id" INT,
  "term" VARCHAR(255) NOT NULL,
  PRIMARY KEY("id"),
  UNIQUE("term"));

CREATE TABLE "index_posting" (
  "doc_id" INT NOT NULL,
  "term_id" INT NOT NULL,
   PRIMARY KEY("doc_id", "field_id", "term_id"),,
   CONSTRAINT "index_posting_doc_id_fkey" FOREIGN KEY ("doc_id")
    REFERENCES "document"("doc_id") ON DELETE CASCADE,
   CONSTRAINT "index_posting_term_id_fkey" FOREIGN KEY ("term_id")
    REFERENCES "index_term"("id") ON DELETE CASCADE);;
CREATE INDEX "index_posting_term_id_idx" ON "index_posting"("term_id");

CREATE TABLE "published_files" (
  "doc_id" INTEGER NOT NULL,,
  "uri_id" INTEGER,
  "user_id" INTEGER NOT NULL,
  "status" INTEGER NOT NULL,
  "title" VARCHAR(1024),
  PRIMARY KEY("uri_id"));
CREATE INDEX "published_files_doc_id_idx" ON "published_files"("doc_id");

about 600.000 entries in the index_term, about 4 Millions in the index_posting and 300.000 in the published_files table.

Now when i want to find the number of unique doc_ids in index_posting which reference some terms i use the following SQL.

 select count(distinct index_posting.doc_id)  from index_term, index_posting 
   where  
    index_posting.term_id = index_term.id and index_term.term like '%test%'

The result is displayed in reasonable time (0.3 secs). Asking Explain Query plan returns

0|0|0|SCAN TABLE index_term
0|1|1|SEARCH TABLE index_posting USING INDEX index_posting_term_id_idx (term_id=?)

When i want to filter the count in the way that it only includes doc_ids of index_posting if there exists a published_files entry:

 select count(distinct index_posting.doc_id)  from index_term, index_posting, 
   published_files where  
    index_posting.term_id = index_term.id and index_posting.doc_id = published_files.doc_id and index_term.term like '%test%'

The query takes almost 10 times as long. Asking Explain Query plan returns

 0|0|1|SCAN TABLE index_posting
 0|1|0|SEARCH TABLE index_term USING INDEX sqlite_autoindex_index_term_1 (id=?)
 0|2|2|SEARCH TABLE published_files AS pf USING COVERING INDEX published_files_doc_id_idx (doc_id=?)

So as far as i understand SQLITE changed here its query plan doing a full table scan of index_posting and a lookup in index_term instead of the other way around.

As a workaround i did do a

 analyze index_posting;
 analyze index_term;
 analyze published_files;

and now it seems correct,

0|0|0|SCAN TABLE index_term
0|1|1|SEARCH TABLE index_posting USING INDEX index_posting_term_id_idx (term_id=?)
0|2|2|SEARCH TABLE published_files USING COVERING INDEX published_files_doc_id_idx (doc_id=?)

but my question is - is there a way to force SQLITE to always use the correct query plan?

TIA


Solution

  • ANALYZE is not a workaround; it's supposed to be used.

    You can use CROSS JOIN to enforce a certain order of the nested loops, or use INDEXED BY to force a certain index to be used. However, you asked for "the correct query plan", which might not be same as the one enforced by these mechanisms.