Search code examples
sqlsqlitetimestampdistincttimespan

Get distinct values within time chunks


I have a table with different species and their occurences (timestamp) from field survey:

species | timestamp
A       | 2021-04-01T12:35
A       | 2021-04-01T12:36
B       | 2021-04-01T12:36
A       | 2021-04-01T12:37
A       | 2021-04-01T12:39
C       | 2021-04-01T12:40
B       | 2021-04-01T12:42
C       | 2021-04-01T12:44
C       | 2021-04-01T12:44
B       | 2021-04-01T12:47

Now I want to count them, not in total but in chunks of 10 minutes each, where only the first occurence is counted. That means between 12:31 and 12:40 there are species A,B,C. Between 12:41 and 12:50 there are species B,C.

For a single 10 minute chunk I can do something like

SELECT DISTINCT(species) 
FROM table
WHERE timestamp IS <condition>

But what I need is something like:

chunk_start_time | chunk_end_time   | species 
2021-04-01T12:31 | 2021-04-01T12:40 | A       
2021-04-01T12:31 | 2021-04-01T12:40 | B       
2021-04-01T12:31 | 2021-04-01T12:40 | C       
2021-04-01T12:41 | 2021-04-01T12:50 | B       
2021-04-01T12:41 | 2021-04-01T12:50 | C       

My timestamp has no seconds. That's the reason why it is x1 to x0 minutes. How can I do the math in SQLite or should I better use Python pandas for that?


Solution

  • Use datetime function strftime() to produce the chunks:

    SELECT DISTINCT
           strftime(
             '%Y-%m-%dT%H:%M', 
             timestamp, 
             '-' || CASE WHEN timestamp LIKE '%0' THEN 9 ELSE SUBSTR(timestamp, -1) - 1 END || ' minute'
           ) chunk_start_time,
           strftime(
             '%Y-%m-%dT%H:%M', 
             timestamp, 
             '+' || CASE WHEN timestamp LIKE '%0' THEN 0 ELSE 10 - SUBSTR(timestamp, -1) END || ' minute'
           ) chunk_end_time,
           species
    FROM tablename
    ORDER BY chunk_start_time, species;
    

    See the demo.