Search code examples
pythonmysqlpandaspymysql

How to increase the processing speed of mysql query in pandas


My ids in tuple is below

ids = tuple(df1['ids'])

the ids has 100k rows

My query

q_id=f'''

SELECT id, COUNT(*) AS Count 
FROM NEWS 
WHERE status = 'JOINED' AND id IN {ids} 
GROUP BY 1

connection is a method which establish connection with username and password

df2 = connection('mysqldb', q_id)

It's taking too much time to get the result. How to increase the processing speed of mysql query in pandas?


Solution

  • Please provide SHOW CREATE TABLE news.

    It needs to include

     INDEX(status, id)
    

    Is id the PRIMARY KEY? If so, then this is a rather weird query to write:

    SELECT id, COUNT(*) AS Count 
        FROM NEWS 
        WHERE status = 'JOINED'
          AND id IN {ids} 
        GROUP BY 1
    

    because all the counts will be "1". (This is because the PRIMARY KEY is "unique").

    To find out how many of those ids are 'JOINED':

    SELECT COUNT(*) AS Count 
        FROM NEWS 
        WHERE status = 'JOINED'
          AND id IN {ids} 
    

    To find out which of the ids are 'JOINED':

    SELECT id 
        FROM NEWS 
        WHERE status = 'JOINED'
          AND id IN {ids}