Search code examples
postgresqlcalculated-columnslag

Using Postgres LAG with a TEMP table


In a mobile app that I am currently developing I keep track, serverside, of how many visitors enter/leave the area of visibility of one or more BLE beacons. The app communicates the unique (string) ID of the beacon along with the day (as a difference from 01/04/2021) and 5 minute time slot at which the event happened. All of this ends up - via an intermediate Redis clearing house - in the following PGSQL table

CREATE TABLE iobeacons (
 "beacon" character varying(24) NOT NULL,
 "tag" integer NOT NULL,
 "slot" smallint NOT NULL,
 "hits" smallint NOT NULL,
 "action" smallint NOT NULL
) WITH (oids = false);

INSERT INTO "iobeacons" ("beacon", "tag", "slot", 
"hits", "action") VALUES
('abc', 1,  1,  5,  -1),
('abc', 1,  2,  4,  -1),
('abc', 1,  1,  15, 1),
('abc', 1,  3,  2,  -1),
('abc', 1,  4,  2,  -1),
('abc', 1,  5,  2,  -1);

With this done I have the following table

Beacon   Tag   Slot  Hits   Action
----------------------------------
  abc     1      1      5     -1
  abc     1      2      4     -1
  abc     1      1     15      1
  abc     1      3      2     -1
  abc     1      4      2     -1
  abc     1      5      2     -1

What I can infer from this table is the following

  • 15 visitors entered beacon field abc in the first five minute time slot on Day 1
  • 4 of those left that beacon field during the next five minute interval
  • 2 visitors lingered for each of the periods: 10 - 15 minutes, 15 - 20 minutes and > 20 minutes

What I need to do is be able to run SQL queries to extract such information efficiently. Take for instance the question

Find the number of visitors who entered beacon field abc during the 1st 5 minute slot on Day 1 and then left after a stay duration of between 10 & 15 minutes.

SQL is not quite my forte so all I have been able to come up with thus far is this

CREATE TEMP TABLE hitset(sumhits) AS (select sum(hits) from iobeacons where beacon = 'abc' and tag = 1 and (slot < 2) and (action = 1) UNION select sum(hits) from iobeacons where beacon = 'abc' and tag = 1 and (slot = 2 or slot = 3) and action = -1);

select sumhits,lag(sumhits,1) over (order by sumhits) from hitset;

Where I am attempting to do the following

  • Create a temporary table with a column called sumhits
  • Insert into that table the sum of all entering "hits" for beacon field abc on Day 1 and time slot 1 AND
  • the sum of all exiting "hits" for the same beacon field on Day 1 and during the time slots 2 or 3

All good so far. I get the following table

 sumhits
    6
   15

Here I enter unchartered (for me) territory with LAG. What I have attempted to do is to get the difference between the sumhits columns of two rows of the temp table. The result I get is this

sumhits    lag
   6       NULL
  15        6

My, incomplete, understanding of LAG suggested that in the second row, above, I should get 15 (Row 2) - 6 (Row 1) = 9. Why is this not the result?


This is an rephrased version of a question I had asked yesterday - quite extensively rephrased, and with my own efforts thus far, for clarity.


Solution

  • I am slightlty surprised not to have got an answer here. I am leaving the question with my own answer. LAG simply gets a column from a prior row. Any arithmetic against the corresponding column in the current row has to be performed in the select. So in this instance that last SQL SELECT should read

    select sumhits - lag(sumhits,1) over (order by sumhits) from hitset;
    

    You will find an excellent reference on performing such calculations using LAG here.