Search code examples
mysqlcrondatabase-performance

MySQL "pileup" when importing rows


I have the following cron process running every hour to update global game stats:

  • Create temporary table
  • For each statistic, insert rows into the temporary table (stat key, user, score, rank)
  • Truncate main stats table
  • Copy data from temporary table to main table

The last step causes massive backlog in queries. Looking at SHOW PROCESSLIST I see a bunch of updating-status queries that are stuck until the copy completes (which may take up to a minute).

However I did notice that it's not like it has consecutive query IDs piling up, many queries get completed just fine. So it almost seems like it's a "thread" that gets stuck or something. Also of note is that the stuck updates have nothing in common with the ongoing copy (different tables, etc)

So:

  • Can I have cron connect to MySQL on a dedicated "thread" such that its disk activity (or whatever it is) doesn't lock other updates, OR
  • Am I misinterpreting what's going on, and if so how can I find out what the actual case is?

Let me know if you need any more info.


Solution

  • MySQL threads are not perfectly named. If you're a Java dev, for example, you might make some untrue assumptions about MySQL threads based on your Java knowledge.

    For some reason that's hard to diagnose from a distance, your copy step is blocking some queries from completing. If you're curious about which ones try doing

     SHOW FULL PROCESSLIST
    

    and try to make sense of the result.

    In the meantime, you might consider a slightly different approach to refreshing these hourly stats.

    1. create a new, non temporary table, calling it something like stats_11 for the 11am update. If the table with that name already existed, drop the old one first.
    2. populate that table as needed.
    3. add the indexes it needs. Sometimes populating the table is faster if the indexes aren't in place while you're doing it.
    4. create or replace view stats as select * from stats_11

    Next hour, do the same with stats_12. The idea is to have your stats view pointing to a valid stats table almost always.

    This should reduce your exposure time to the stats-table building operaiton.