Search code examples
mysqlsqlalchemyquery-optimizationgreatest-n-per-groupdatabase-optimization

optimize latest timestamp query mysql


to get only data with latest timestamp i use the mysql query from this answer:

fetch-the-row-which-has-the-max-value- for-a-column

my query is following:

SELECT stock_id,timestamp,price FROM market m1
                        WHERE timestamp = 
                            (SELECT MAX(timestamp) FROM market m2 WHERE m1.stock_id = m2.stock_id)

but it takes 10 minutes to execute.

what are the different options to optimize it? (mysql)

market has following schema: (sqlalchemy)

class Market(db.Model):
    stock_id=db.Column(db.Integer,db.ForeignKey('stock.id'),primary_key=True)
    timestamp=db.Column(db.Integer,primary_key=True)
    price=db.Column(db.Float)

Solution

  • You are using a so-called dependent subquery. These can sometime be very inefficient.

    You can use this subquery to retrieve the max timestamp for each stock_id value.

                       SELECT MAX(timestamp) timestamp, stock_id
                         FROM market
                        GROUP BY stock_id
    

    Your compound primary key should make this quite fast because MySQL will do a "loose index scan" to satisfy it.

    Then you can use this as a subquery, as follows:

      SELECT m.stock_id, m.timestamp, m.price
        FROM market m
        JOIN (
                       SELECT MAX(timestamp) timestamp, stock_id
                         FROM market
                        GROUP BY stock_id
             ) maxt on m.stock_id = maxt.stock_id AND m.timestamp = maxt.timestamp
    

    This, too, should exploit your primary key.