I am trying to speed up a sqlite3 query, currently it is quite slow.
I have a table of ~22 million rows with two columns (uid, info)
In python, I then find a list of ~10,000 uid values that correspond to uid's in the table above. (randomly generated numbers for this post)
import numpy as np
import pandas as pd
import sqlite3
conn = sqlite3.connect('mydb.db')
uids = np.random.random(10000)
uids = list(map(int,list(map(round,uids*2000000))))
sql_query = 'SELECT * FROM mytable WHERE uid IN (' + ','.join(map(str, uids)) + ')'
sqlpd = pd.read_sql_query(sql_query,conn)
The query works, but it is quite slow (~15 seconds). Wondering how I can speed this up, I am guessing it is the WHERE ... IN methodology that is causing the issue
Not sure why, but I dropped the table and rebuilt with the same code/data. Now it is running in 0.1 seconds.
Problem solved, not sure why :/