I have a table that has many rows in it, with rows occurring at the rate of 400-500 per minute (I know this isn't THAT many), but I need to do some sort of 'trend' analysis on the data that has been collected over the last 1 minute.
Instead of pulling all records that have been entered and then processing each of those, I would really like to be able to select, say, 10 records - which occur at a -somewhat- even distribution through the timeframe specified.
ID DEVICE_ID LA LO CREATED
-------------------------------------------------------------------
1 1 23.4 948.7 2018-12-13 00:00:01
2 2 22.4 948.2 2018-12-13 00:01:01
3 2 28.4 948.3 2018-12-13 00:02:22
4 1 26.4 948.6 2018-12-13 00:02:33
5 1 21.4 948.1 2018-12-13 00:02:42
6 1 22.4 948.3 2018-12-13 00:03:02
7 1 28.4 948.0 2018-12-13 00:03:11
8 2 23.4 948.8 2018-12-13 00:03:12
...
492 2 21.4 948.4 2018-12-13 00:03:25
493 1 22.4 948.2 2018-12-13 00:04:01
494 1 24.4 948.7 2018-12-13 00:04:02
495 2 27.4 948.1 2018-12-13 00:05:04
Considering this data set, instead of pulling all those rows, I would like to maybe pull a row from the set every 50 records (10 rows for roughly ~500 rows returned).
This does not need to be exact, I just need a sample in which to perform some sort of linear regression on.
Is this even possible? I can do it in my application code if need be, but I wanted to see if there was a function or something in MySQL that would handle this.
Edit Here is the query I have tried, which works for now - but I would like the results more evenly distributed, not by RAND().
SELECT * FROM (
SELECT * FROM (
SELECT t.*, DATE_SUB(NOW(), INTERVAL 30 HOUR) as offsetdate
from tracking t
HAVING created > offsetdate) as parp
ORDER BY RAND()
LIMIT 10) as mastr
ORDER BY id ASC;
Do not order by RAND() as the rand calculated for every row, then reordered and only then you are selecting a few records.
You can try something like this:
SELECT
*
FROM
(
SELECT
tracking.*
, @rownum := @rownum + 1 AS rownum
FROM
tracking
, (SELECT @rownum := 0) AS dummy
WHERE
created > DATE_SUB(NOW(), INTERVAL 30 HOUR)
) AS s
WHERE
(rownum % 10) = 0
Index on created is "the must".
Also, you might consider to use something like 'AND (UNIX_TIMESTAMP(created) % 60 = 0)' which is slightly different from what you wanted, however might be OK (depends on your insert distribution)