Search code examples
mysqlmariadbpymysql

Making thousands of SELECT queries faster


Situation

  • Working with Python 3.7.2
  • I have read previlege of a MariaDB table with 5M rows on a server.
  • I have a local text file with 7K integers, one per line.
  • The integers represent IDXs of the table.
  • The IDX column of the table is the primary key. (so I suppose it is automatically indexed?)

Problem

I need to select all the rows whose IDX is in the text file.

My effort

Version 1

Make 7K queries, one for each line in the text file. This makes approximately 130 queries per second, costing about 1 minute to complete.

import pymysql
connection = pymysql.connect(....)
with connection.cursor() as cursor:
    query = (
        "SELECT *"
        " FROM TABLE1"
        " WHERE IDX = %(idx)s;"
    )

    all_selected = {}
    with open("idx_list.txt", "r") as f:
        for idx in f:
            idx = idx.strip()
            if idx:
                idx = int(idx)
                parameters = {"idx": idx}
                cursor.execute(query, parameters)
                result = cursor.fetchall()[0]
                all_selected[idx] = result

Version 2

Select the whole table, iterate over the cursor and cherry-pick rows. The for-loop over .fetchall_unbuffered() covers 30-40K rows per second, and the whole script costs about 3 minutes to complete.

import pymysql
connection = pymysql.connect(....)
with connection.cursor() as cursor:
    query = "SELECT * FROM TABLE1"

    set_of_idx = set()
    with open("idx_list.txt", "r") as f:
        for line in f:
            if line.strip():
                line = int(line.strip())
                set_of_idx.add(line)


    all_selected = {}
    cursor.execute(query)
    for row in cursor.fetchall_unbuffered():
        if row[0] in set_of_idx:
            all_selected[row[0]] = row[1:]

Expected behavior

I need to select faster, because the number of IDXs in the text file will grow as big as 10K-100K in the future.

I consulted other answers including this, but I can't make use of it since I only have read previlege, thus impossible to create another table to join with.

So how can I make the selection faster?


Solution

  • A temporary table implementation would look like:

    connection = pymysql.connect(....,local_infile=True)
    with connection.cursor() as cursor:
        cursor.execute("CREATE TEMPORARY TABLE R (IDX INT PRIMARY KEY)")
        cursor.execute("LOAD DATA LOCAL INFILE 'idx_list.txt' INTO R")
        cursor.execute("SELECT TABLE1.* FROM TABLE1 JOIN R USING ( IDX )")
        ..
        cursor.execute("DROP TEMPORARY TABLE R")