Problem domain: I (well, the business) have a wifi network that accepts public connections. We want to know how long each device is staying connected to each of the access points (AP). This is known as 'dwell time'. The problem is complicated as a device can, and usually does, move around between APs during the day and will often come back to many of them more than once.
We currently use Splunk as our data capture and reporting tool and it does this auto-magically but we are considering a move to AWS and so will need to re-form everything using a combination of ETL and SQL.
I have data that looks like the below:
rowID clientMAC apMAC timeSeen
100 1 a 12:01
101 1 a 12:03
102 1 a 12:05
103 1 b 12:10
104 1 b 12:20
105 2 a 12:20
106 2 a 12:22
107 1 a 13:00
108 1 a 13:02
109 1 a 13:06
110 1 a 13:12
My challenge is to report the duration of each example of clientAP+macAP, so for example, how long was clientMAC=1
connected to apMAC=a
.
I can't take the final timeSeen
from the initial timeSeen
as clientMAC=1
connects to apMAC=b
in the middle, so the result would include the time of that connection too.
The plain english logic of what I need to do is:
For each grouping of clientMAC
and apMAC
, determine the connection duration during the chosen time period. If there is a gap of, say 15 minutes between rows that are of the same combination, start a new duration calculation and close off the old one. Essentially, each set of a given clientMAC
being seen at a given apMAC
should be a seperate 'transaction' and reported as a single line.
So the desired output is something like:
clientMAC apMAC Duration
1 a ...
1 b ...
2 a ...
1 a ...
A version that doesn't use LAG(), and thus will work on older versions of SQL (LAG is SQL Server 2012 onwards), just in case. I have a lot of clients still using SQL Server 2008, and frequently need solutions that will work on older version, so someone else might need the same too!
This example includes creating some test data, so you can see it working and the results
-- Create a temp table to hold the test data
CREATE TABLE #TestData
(
rowID INT NOT NULL PRIMARY KEY,
clientMAC INT NOT NULL,
apMAC VARCHAR(1) NOT NULL,
timeSeen DATETIME NOT NULL
)
-- Create some test data
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (100, 1, 'a', '2019-Nov-01 12:01:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (101, 1, 'a', '2019-Nov-01 12:02:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (102, 1, 'a', '2019-Nov-01 12:05:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (103, 1, 'b', '2019-Nov-01 12:10:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (104, 1, 'b', '2019-Nov-01 12:20:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (105, 2, 'a', '2019-Nov-01 12:20:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (106, 2, 'a', '2019-Nov-01 12:22:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (107, 1, 'a', '2019-Nov-01 13:00:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (108, 1, 'a', '2019-Nov-01 13:02:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (109, 1, 'a', '2019-Nov-01 13:06:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (110, 1, 'a', '2019-Nov-01 13:12:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (111, 1, 'a', '2019-Nov-01 14:00:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (112, 1, 'a', '2019-Nov-01 14:12:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (113, 1, 'a', '2019-Nov-01 14:14:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (114, 1, 'a', '2019-Nov-01 14:30:00')
INSERT INTO #TestData (rowID, clientMAC, apMAC, timeSeen) VALUES (115, 1, 'a', '2019-Nov-01 14:35:00')
-- Start of Actual Code
-- Store our maximum allowed gap in minuutes into a variable
DECLARE @MaximumGapMinutes INT = 15
-- Create temp table for primary calculated data
CREATE TABLE #DwellTimes (
rowID INT NOT NULL PRIMARY KEY,
clientMAC INT NOT NULL,
apMAC VARCHAR(1) NOT NULL,
timeSeen DATETIME NOT NULL,
lastSeen DATETIME NOT NULL,
DwellTime INT NOT NULL
)
-- Populate temp table
INSERT INTO #DwellTimes
SELECT *, DateDiff(MINUTE, lastSeen, timeSeen) AS DwellTime
FROM (
SELECT *, IsNull((SELECT TOP 1 timeSeen
FROM #TestData TDInner
WHERE TDInner.clientMac = TDMain.clientMac AND TDInner.apMac = TDMain.apMac
AND TDInner.timeSeen < TDMain.timeSeen
ORDER BY timeSeen DESC
), timeSeen) AS lastSeen
FROM #TestData TDMain
) InnerTable
-- Calculate the Dwell Time for visits, counting gaps longer than @MaximumGapMinutes as a new visit
SELECT Min(timeSeen) AS StartTime, clientMac, apMac,
SUM(CASE WHEN DwellTime > @MaximumGapMinutes THEN 0 ELSE DwellTime END) AS DwellTime
FROM (
SELECT *, (SELECT COUNT(*)
FROM #DwellTimes DSub
WHERE DSub.clientMac = DMain.clientMac AND DSub.apMac = DMain.apMac
AND DSub.timeSeen <= DMain.timeSeen AND DSub.DwellTime > 15) AS GapNumber
FROM #DwellTimes DMain
) InnerTable
GROUP BY clientMac, apMac, GapNumber
ORDER BY StartTime, clientMAC, apMAC, DwellTime
-- Clean up after ourselves
DROP TABLE #DwellTimes
-- End of Actual Code
-- Clean up after ourselves
DROP TABLE #TestData
Results:-
An explanation of how it works.
In the actual code itself, not the test data prep, the first thing we do is declare a variable for our maximum gap beween times to still be considered part of the same visit
-- Store our maximum allowed gap in minuutes into a variable
DECLARE @MaximumGapMinutes INT = 15
We then create a temporary table to hold Dwell Time calculations, and populate it
-- Populate temp table
INSERT INTO #DwellTimes
SELECT *, DateDiff(MINUTE, lastSeen, timeSeen) AS DwellTime
FROM (
SELECT *, IsNull((SELECT TOP 1 timeSeen
FROM #TestData TDInner
WHERE TDInner.clientMac = TDMain.clientMac AND TDInner.apMac = TDMain.apMac
AND TDInner.timeSeen < TDMain.timeSeen
ORDER BY timeSeen DESC
), timeSeen) AS lastSeen
FROM #TestData TDMain
) InnerTable
The inner select finds the previous timeSeen for that clientMac and apMac. If there is no previous time seen, then it uses the current timeSeen (the IsNull(subselect, timeSeen) section).
The outer select then calculates the Dwell Time between the current timeSeen and the previous one (lastSeen) for the same clientMac and apMac. Because we use the current timeSeen as lastSeen if there is no previous visit, the Dwell Time will be zero if this is the first visit.
Results stored in #DwellTimes
Finally, we calculate the actual visits and Dwell Times, taking a gap longer than our maximum as a new visit.
-- Calculate the Dwell Time for visits, counting gaps longer than @MaximumGapMinutes as a new visit
SELECT Min(timeSeen) AS StartTime, clientMac, apMac,
SUM(CASE WHEN DwellTime > @MaximumGapMinutes THEN 0 ELSE DwellTime END) AS DwellTime
FROM (
SELECT *, (SELECT COUNT(*)
FROM #DwellTimes DSub
WHERE DSub.clientMac = DMain.clientMac AND DSub.apMac = DMain.apMac
AND DSub.timeSeen <= DMain.timeSeen AND DSub.DwellTime > 15) AS GapNumber
FROM #DwellTimes DMain
) InnerTable
GROUP BY clientMac, apMac, GapNumber
ORDER BY StartTime, clientMAC, apMAC, DwellTime
The inner select here adds a field GapNumber, which we can use as a grouping field. This is simply a count of how many previous records exceeded our maximum, including the current record. Thus if the current record exceeds the maximum, it is the start of a new visit.
Results with Gap Number
And finally, grouping by clientMac, apMac and GapNumber allows us to use the sum of DwellTime as the Dwell Time per Visit, provided we set GapNumber to 0 if it's larger than our maximum, as that will be the start of the visit
SUM(CASE WHEN DwellTime > @MaximumGapMinutes THEN 0 ELSE DwellTime END) AS DwellTime
Hope this is useful to someone!