Search code examples
pythonsqlsqlitepandaswhere-in

Slow WHERE IN SQL query using pandas python


I am trying to speed up a sqlite3 query, currently it is quite slow.

I have a table of ~22 million rows with two columns (uid, info)

In python, I then find a list of ~10,000 uid values that correspond to uid's in the table above. (randomly generated numbers for this post)

import numpy as np
import pandas as pd
import sqlite3

conn = sqlite3.connect('mydb.db')

uids = np.random.random(10000)
uids = list(map(int,list(map(round,uids*2000000))))

sql_query = 'SELECT * FROM mytable WHERE uid IN (' + ','.join(map(str, uids)) + ')'
sqlpd = pd.read_sql_query(sql_query,conn)

The query works, but it is quite slow (~15 seconds). Wondering how I can speed this up, I am guessing it is the WHERE ... IN methodology that is causing the issue


Solution

  • Not sure why, but I dropped the table and rebuilt with the same code/data. Now it is running in 0.1 seconds.

    Problem solved, not sure why :/