to get only data with latest timestamp i use the mysql query from this answer:
fetch-the-row-which-has-the-max-value- for-a-column
my query is following:
SELECT stock_id,timestamp,price FROM market m1
WHERE timestamp =
(SELECT MAX(timestamp) FROM market m2 WHERE m1.stock_id = m2.stock_id)
but it takes 10 minutes to execute.
what are the different options to optimize it? (mysql)
market has following schema: (sqlalchemy)
class Market(db.Model):
stock_id=db.Column(db.Integer,db.ForeignKey('stock.id'),primary_key=True)
timestamp=db.Column(db.Integer,primary_key=True)
price=db.Column(db.Float)
You are using a so-called dependent subquery. These can sometime be very inefficient.
You can use this subquery to retrieve the max timestamp for each stock_id value.
SELECT MAX(timestamp) timestamp, stock_id
FROM market
GROUP BY stock_id
Your compound primary key should make this quite fast because MySQL will do a "loose index scan" to satisfy it.
Then you can use this as a subquery, as follows:
SELECT m.stock_id, m.timestamp, m.price
FROM market m
JOIN (
SELECT MAX(timestamp) timestamp, stock_id
FROM market
GROUP BY stock_id
) maxt on m.stock_id = maxt.stock_id AND m.timestamp = maxt.timestamp
This, too, should exploit your primary key.