Search code examples
sqlpostgresqlpostgresql-9.3

Postgres group results with window function lag returns 0 rows


I'm trying to do a query where I want to ignore the first and the last row of the result query. In order to do this a hit was given using window functions like the query gave to me above

SELECT lag(timestamp_min)    OVER (ORDER BY timestamp_min) AS timestamp_min,
       lag(type)             OVER (ORDER BY timestamp_min) AS type,
       lag(sum_first_medium) OVER (ORDER BY timestamp_min),
FROM (SELECT to_timestamp(
                floor(
                   (extract('epoch' FROM TIMESTAMP) / 300)
                ) * 300
             ) AS timestamp_min,
             type,
             floor(sum(medium[1])) AS sum_first_medium
      FROM default_dataset
      WHERE type = 'ap_clients.wlan0'
        AND timestamp > current_timestamp - INTERVAL '85 minutes'
        AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
      GROUP BY timestamp_min, type) lagme
OFFSET 2;

The problem is the last query is not returning anything:

ws_controller_hist=> SELECT lag(timestamp_min) OVER (ORDER BY timestamp_min) AS timestamp_min, lag(type) OVER (ORDER BY timestamp_min) AS type, lag(sum_first_medium) OVER (ORDER BY timestamp_min) FROM (SELECT to_timestamp(floor((extract('epoch' FROM TIMESTAMP) / 300)) * 300) AS timestamp_min, type, floor(sum(medium[1])) AS sum_first_medium FROM default_dataset WHERE type = 'ap_clients.wlan0' AND timestamp > current_timestamp - INTERVAL '85 minutes' AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638' GROUP BY timestamp_min, type) lagme OFFSET 2;
 timestamp_min | type | lag
---------------+------+-----
(0 rows)

But I have data for the type "ap_clients.wlan0"

ws_controller_hist=> select * from default_dataset where type ='ap_clients.wlan0' order by timestamp desc limit 3;
                  id                  |       timestamp        | agregation_period | medium | maximum | minimum | sum |       type       |              device_id               | network_id |           organiza
tion_id            |     labels
--------------------------------------+------------------------+-------------------+--------+---------+---------+-----+------------------+--------------------------------------+------------+-------------------
-------------------+----------------
 b3661dca-a459-43cd-a3c4-7609e36c18d5 | 2018-01-02 10:21:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 abbca52d-f3f5-4a99-bd2f-41602964506e | 2018-01-02 10:16:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 24e00926-bc6d-4025-8a6c-a8de9efacdad | 2018-01-02 10:11:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
(3 rows)

I need to have a query retrieving the sum of all mediums in the last hour grouped by 5 minutes.

My first approach to resolve my problem to ignore the first record I used offset(1) and to ignore the last i was trying to do a limit in my id field, ordering by timestamp desc.

ws_controller_hist=>  
SELECT to_timestamp(floor((extract('epoch' FROM TIMESTAMP) / 300)) * 300) 
AS timestamp_min,
       TYPE,
       floor(sum(medium[1]))
FROM default_dataset
WHERE TYPE LIKE 'ap_clients.wlan0'
  AND TIMESTAMP > CURRENT_TIMESTAMP - interval '85 minutes'
  AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
  AND id NOT IN
    (SELECT id
     FROM default_dataset
     ORDER BY TIMESTAMP DESC
     LIMIT 1)
GROUP BY timestamp_min,
         TYPE
ORDER BY timestamp_min ASC
OFFSET 1;

     timestamp_min      |       type       | floor
------------------------+------------------+-------
 2017-12-19 14:20:00+00 | ap_clients.wlan0 |    38
 2017-12-19 14:25:00+00 | ap_clients.wlan0 |    37
 2017-12-19 14:30:00+00 | ap_clients.wlan0 |    39
 2017-12-19 14:35:00+00 | ap_clients.wlan0 |    42
 2017-12-19 14:40:00+00 | ap_clients.wlan0 |    43
 2017-12-19 14:45:00+00 | ap_clients.wlan0 |    44
 2017-12-19 14:50:00+00 | ap_clients.wlan0 |    45
 2017-12-19 14:55:00+00 | ap_clients.wlan0 |    45
 2017-12-19 15:00:00+00 | ap_clients.wlan0 |    43
 2017-12-19 15:05:00+00 | ap_clients.wlan0 |    43
 2017-12-19 15:10:00+00 | ap_clients.wlan0 |    50
 2017-12-19 15:15:00+00 | ap_clients.wlan0 |    52
 2017-12-19 15:20:00+00 | ap_clients.wlan0 |    50
 2017-12-19 15:25:00+00 | ap_clients.wlan0 |    53
 2017-12-19 15:30:00+00 | ap_clients.wlan0 |    49
 2017-12-19 15:35:00+00 | ap_clients.wlan0 |    39
 2017-12-19 15:40:00+00 | ap_clients.wlan0 |    16

But my last query is not ignoring the last record because i have the same records dont using the subquery " and id not in (select id from default_dataset order by timestamp desc limit 1) ".

If I try to make a query to see the results for the type "ap_clients.wlan0" I have

ws_controller_hist=> select * from default_dataset where organization_id='ce4b69af-bdce-4f1b-ba71-dd03544205d5' and type ='ap_clients.wlan0' order by timestamp desc limit 5;
                  id                  |       timestamp        | agregation_period | medium | maximum | minimum | sum |       type       |              device_id               | network_id |           organiza
tion_id            |     labels
--------------------------------------+------------------------+-------------------+--------+---------+---------+-----+------------------+--------------------------------------+------------+-------------------
-------------------+----------------
 b3661dca-a459-43cd-a3c4-7609e36c18d5 | 2018-01-02 10:21:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 abbca52d-f3f5-4a99-bd2f-41602964506e | 2018-01-02 10:16:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 24e00926-bc6d-4025-8a6c-a8de9efacdad | 2018-01-02 10:11:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 e67baf28-6d5b-43a5-85e2-fcf2d04a0b2e | 2018-01-02 10:06:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}
 c7ce16ce-9cda-423f-b32b-f4d6dce859e6 | 2018-01-02 10:01:08+00 |               300 | {0}    | {0}     | {0}     | {0} | ap_clients.wlan0 | 9f3f6261-a2c3-45cd-9dc4-f9523ace0b50 |            | ce4b69af-bdce-4f1b
-ba71-dd03544205d5 | {time,clients}

What can I do?


Solution

  • A simple solution is to use lag and lead window functions with a parameter that cannot be NULL, in such way lag will return NULL for the first row and lead will return NULL for the last row, so you can simple filter for rows where both of them are NOT NULL:

    SELECT
        t2.timestamp_min,
        t2.type,
        t2.sum_first_medium
    FROM (
        SELECT
            t1.*,
            lead(1) OVER(ORDER BY t1.timestamp_min) AS lead,
            lag(1) OVER(ORDER BY t1.timestamp_min) AS lag
        FROM (
            SELECT
                to_timestamp(
                  floor(
                    (extract('epoch' FROM TIMESTAMP) / 300)
                  ) * 300
                ) AS timestamp_min,
                type,
                floor(sum(medium[1])) AS sum_first_medium
            FROM default_dataset
            WHERE
                type = 'ap_clients.wlan0'
                AND timestamp > current_timestamp - INTERVAL '85 minutes'
                AND organization_id = '9fc02db4-c3df-4890-93ac-8dd575ca5638'
            GROUP BY timestamp_min, type
        ) t1
    ) t2
    WHERE
        t2.lag IS NOT NULL -- Only first row will return NULL, skip it
        AND t2.lead IS NOT NULL -- Only last row will return NULL, skip it
    ORDER BY t2.timestamp_min
    

    Notice I have used lead(1) and lag(1) just because 1 is a non-NULL expression, you could use any non-NULL expression or even a column there (since guaranteed to be NOT NULL).

    Another possible solution would be applying two row_number() calls, one use ORDER BY timestamp_min ASC and another using ORDER BY timestamp_min DESC, and then filtering the rows which those are <> 1. But that would require two sorts of the dataset (one for ASC and one for DESC), while the lag/lead solution requires only one (although it may be harder to understand).