We have a chat system that generates multiple event logs per second sometimes for every event during a chat. The issue is that these consume a massive amount of data storage (which is very expensive on that platform) and we'd like to streamline what we actually store and delete things that really aren't necessary.
To that end, there's an event type for what position in the queue the chat is. We don't care about each position as long as they are not intervening events for that chat. So we want to keep only the first and last in each distinct group where there were no other event types to just get "total time in queue" for that period.
To complicate this, a customer can go in and out of queue as they get transferred by department, so the SAME CHAT can have multiple blocks of these queue position records. I've tried using FIRST_VALUE and LAST_VALUE and it gets me most of the way there, but fails when we have the case of two distinct blocks of these events.
Here's the script to generate the test data:
<!-- language: lang-sql -->
CREATE TABLE #testdata (
id varchar(18),
name varchar(8),
[type] varchar(20),
livechattranscriptid varchar(18),
groupid varchar(40))
INSERT INTO #testdata (id,name,[type],livechattranscriptid,groupid) VALUES
('0DZ14000003I2pOGAS','34128314','ChatRequest','57014000000ltfIAAQ','57014000000ltfIAAQChatRequest'),
('0DZ14000003IGmQGAW','34181980','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IHbqGAG','34185171','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003ILuHGAW','34201743','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IQ6cGAG','34217778','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IR7JGAW','34221794','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003IiDnGAK','34287448','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IiDoGAK','34287545','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003Iut5GAC','34336044','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003Iv7HGAS','34336906','Accept','57014000000ltfIAAQ','57014000000ltfIAAQAccept')
And here is the attempt to identify anything that was the first and last id for it's group ordered by the name field and grouped by the transcriptid:
select *,FIRST_VALUE(id) OVER(Partition BY groupid order by livechattranscriptid,name asc) as firstinstancegroup,
LAST_VALUE(id) OVER(Partition BY groupid order by livechattranscriptid,name asc RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) as lastinstancegroup from #testdata order by livechattranscriptid,name
The issue is, it gives me the same first and last id for ALL of them by that entire group rather than treating each group of Enqueue records as a distinct group. How would I treat each distinct grouping instance of Enqueue as a unique group?
Here's a similar solution Grouping contiguous table data
not pretty but you will find the logic based from the OP. contiguous data over the same column
declare @mytable table (
id varchar(18),
name varchar(8),
[type] varchar(20),
livechattranscriptid varchar(18),
groupid varchar(100))
INSERT INTO @mytable (id,name,[type],livechattranscriptid,groupid) VALUES
('0DZ14000003I2pOGAS','34128314','ChatRequest','57014000000ltfIAAQ','57014000000ltfIAAQChatRequest'),
('0DZ14000003IGmQGAW','34181980','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IHbqGAG','34185171','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003ILuHGAW','34201743','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IQ6cGAG','34217778','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IR7JGAW','34221794','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003IiDnGAK','34287448','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003IiDoGAK','34287545','PushAssignment','57014000000ltfIAAQ','57014000000ltfIAAQPushAssignment'),
('0DZ14000003Iut5GAC','34336044','Enqueue','57014000000ltfIAAQ','57014000000ltfIAAQEnqueue'),
('0DZ14000003Iv7HGAS','34336906','Accept','57014000000ltfIAAQ','57014000000ltfIAAQAccept')
;with myend as ( --- get all ends
select
*
from
(select
iif(groupid <> lead(groupid,1,groupid) over (order by name),
id,
'x') [newid],name
from @mytable
)x
where newid <> 'x'
)
, mystart as -- get all starts
(
select
*
from
(select
iif(groupid <> lag(groupid,1,groupid) over (order by name),
id,
'x') [newid], name,type,livechattranscriptid
from @mytable
)x
where newid <> 'x'
) ,
finalstart as ( --- get all starts including the first row
select id,
name,type,livechattranscriptid,
row_number() over (order by name) rn
from (
select id,name,type,livechattranscriptid
from (
select top 1 id, name,type,livechattranscriptid
from @mytable
order by name) x
union all
select newid,name,type,livechattranscriptid from mystart
) y
),
finalend as -- get all ends and add the last row
(
select id,
row_number() over (order by name) rn
from (
select id,name from (
select top 1 id,name
from @mytable
order by name desc) x
union all
select newid,name from myend
) y
)
select
s.id [startid]
,s.name
,s.type
,s.livechattranscriptid
,e.id [lastid]
from
finalend e
inner join finalstart s
on e.rn = s.rn --- bind the two results over the positions or row number