I have a time-series data of below format:
datatable(Element: string, Timestamp:datetime, Value:long)
For each Element there is a Timestamp and associated Value for the timestamp. If 2 consecuitive timestamps for a element is more then X-min apart, they are considered to be part of different sessions (The smaller timestamp being end of previous session and bigger one is starting of a new session). For each such session I want to calculate the SessionId (based on session start or a random guid), session start and session end.
Example: (Considering a timestamp with gap of 30-min from previous is considered as starting of a new session)
Input:
Element Timestamp Value
Element-A 2022-03-25 06:15:00 10
Element-A 2022-03-25 06:30:00 10
Element-A 2022-03-25 06:45:00 10
Element-A 2022-03-25 08:15:00 10
Element-A 2022-03-25 08:30:00 10
Element-A 2022-03-25 08:45:00 10
Element-B 2022-03-25 07:15:00 10
Element-B 2022-03-25 07:30:00 10
Element-B 2022-03-25 07:45:00 10
Element-B 2022-03-25 09:15:00 10
Element-B 2022-03-25 09:30:00 10
Element-B 2022-03-25 09:45:00 10
Expected output:
Element Timestamp value SessionId SessionStart SessionEnd
Element-A 2022-03-25 06:15:00 10 guid-1 2022-03-25 06:15:00 2022-03-25 06:45:00
Element-A 2022-03-25 06:30:00 10 guid-1 2022-03-25 06:15:00 2022-03-25 06:45:00
Element-A 2022-03-25 06:45:00 10 guid-1 2022-03-25 06:15:00 2022-03-25 06:45:00
Element-A 2022-03-25 08:15:00 10 guid-2 2022-03-25 08:15:00 2022-03-25 08:45:00
Element-A 2022-03-25 08:30:00 10 guid-2 2022-03-25 08:15:00 2022-03-25 08:45:00
Element-A 2022-03-25 08:45:00 10 guid-2 2022-03-25 08:15:00 2022-03-25 08:45:00
Element-B 2022-03-25 07:15:00 10 guid-3 2022-03-25 07:15:00 2022-03-25 07:45:00
Element-B 2022-03-25 07:30:00 10 guid-3 2022-03-25 07:15:00 2022-03-25 07:45:00
Element-B 2022-03-25 07:45:00 10 guid-3 2022-03-25 07:15:00 2022-03-25 07:45:00
Element-B 2022-03-25 09:15:00 10 guid-4 2022-03-25 09:15:00 2022-03-25 09:45:00
Element-B 2022-03-25 09:30:00 10 guid-4 2022-03-25 09:15:00 2022-03-25 09:45:00
Element-B 2022-03-25 09:45:00 10 guid-4 2022-03-25 09:15:00 2022-03-25 09:45:00
The data volume is high. Please suggest with perf-efficient queries to achieve this.
Per the OP comments, adding a solution with only the summarization part. Please note that -
Element
& SessionIndex
is unique and can be used interchangeably with SessionId
(based on new_guid()
)count_if
) etc.datatable (Element:string, Timestamp:datetime, Value:int)
[
"Element-A" ,"2022-03-25 06:15:00" ,10
,"Element-A" ,"2022-03-25 06:30:00" ,10
,"Element-A" ,"2022-03-25 06:45:00" ,10
,"Element-A" ,"2022-03-25 08:15:00" ,10
,"Element-A" ,"2022-03-25 08:30:00" ,10
,"Element-A" ,"2022-03-25 08:45:00" ,10
,"Element-B" ,"2022-03-25 07:15:00" ,10
,"Element-B" ,"2022-03-25 07:30:00" ,10
,"Element-B" ,"2022-03-25 07:45:00" ,10
,"Element-B" ,"2022-03-25 09:15:00" ,10
,"Element-B" ,"2022-03-25 09:30:00" ,10
,"Element-B" ,"2022-03-25 09:45:00" ,10
]
| partition hint.strategy=shuffle by Element
(
order by Timestamp asc
| extend SessionIndex = row_cumsum(iff(Timestamp - prev(Timestamp) > 30m, 1, 0))
| summarize SessionStart = min(Timestamp), SessionEnd = max(Timestamp) by SessionIndex
| extend Element, SessionId = new_guid()
| project-reorder Element
)
Element | SessionIndex | SessionStart | SessionEnd | SessionId |
---|---|---|---|---|
Element-A | 0 | 2022-03-25T06:15:00Z | 2022-03-25T06:45:00Z | 5d43e356-9aae-40cb-9e2e-bd2741cc9934 |
Element-B | 0 | 2022-03-25T07:15:00Z | 2022-03-25T07:45:00Z | df83db35-c292-4bee-a14e-0ebc2b7ef6b5 |
Element-A | 1 | 2022-03-25T08:15:00Z | 2022-03-25T08:45:00Z | 40dbaa02-b110-4e99-8696-2505a2995553 |
Element-B | 1 | 2022-03-25T09:15:00Z | 2022-03-25T09:45:00Z | 59d6fdeb-a596-4fab-97e5-d9057519c6c0 |
You can start with this.
The demographics of your data (number of records, number of elements, number of sessions per element) will determine how optimized this solution is for your specific needs.
datatable (Element:string, Timestamp:datetime, Value:int)
[
"Element-A" ,"2022-03-25 06:15:00" ,10
,"Element-A" ,"2022-03-25 06:30:00" ,10
,"Element-A" ,"2022-03-25 06:45:00" ,10
,"Element-A" ,"2022-03-25 08:15:00" ,10
,"Element-A" ,"2022-03-25 08:30:00" ,10
,"Element-A" ,"2022-03-25 08:45:00" ,10
,"Element-B" ,"2022-03-25 07:15:00" ,10
,"Element-B" ,"2022-03-25 07:30:00" ,10
,"Element-B" ,"2022-03-25 07:45:00" ,10
,"Element-B" ,"2022-03-25 09:15:00" ,10
,"Element-B" ,"2022-03-25 09:30:00" ,10
,"Element-B" ,"2022-03-25 09:45:00" ,10
]
| partition hint.strategy=shuffle by Element
(
order by Timestamp asc
| extend SessionIndex = row_cumsum(iff(Timestamp - prev(Timestamp) > 30m, 1, 0))
| summarize min(Timestamp), max(Timestamp), make_list(Timestamp), make_list(Value) by SessionIndex
| extend SessionId = new_guid()
| mv-apply Timestamp = list_Timestamp to typeof(datetime), Value = list_Value to typeof(int) on (project Timestamp, Value)
| project Element, Timestamp, Value, SessionStart = min_Timestamp, SessionEnd = max_Timestamp, SessionId, SessionIndex
)
Element | Timestamp | Value | SessionStart | SessionEnd | SessionId | SessionIndex |
---|---|---|---|---|---|---|
Element-A | 2022-03-25T06:15:00Z | 10 | 2022-03-25T06:15:00Z | 2022-03-25T06:45:00Z | 1ac146b1-24fa-427e-b2b3-663d83297d4c | 0 |
Element-A | 2022-03-25T06:30:00Z | 10 | 2022-03-25T06:15:00Z | 2022-03-25T06:45:00Z | 1ac146b1-24fa-427e-b2b3-663d83297d4c | 0 |
Element-A | 2022-03-25T06:45:00Z | 10 | 2022-03-25T06:15:00Z | 2022-03-25T06:45:00Z | 1ac146b1-24fa-427e-b2b3-663d83297d4c | 0 |
Element-B | 2022-03-25T07:15:00Z | 10 | 2022-03-25T07:15:00Z | 2022-03-25T07:45:00Z | cbef109a-73bc-4067-9e7f-ebada6aa444e | 0 |
Element-B | 2022-03-25T07:30:00Z | 10 | 2022-03-25T07:15:00Z | 2022-03-25T07:45:00Z | cbef109a-73bc-4067-9e7f-ebada6aa444e | 0 |
Element-B | 2022-03-25T07:45:00Z | 10 | 2022-03-25T07:15:00Z | 2022-03-25T07:45:00Z | cbef109a-73bc-4067-9e7f-ebada6aa444e | 0 |
Element-A | 2022-03-25T08:15:00Z | 10 | 2022-03-25T08:15:00Z | 2022-03-25T08:45:00Z | c53fba2e-b82e-418c-9380-1e732be8fcb5 | 1 |
Element-A | 2022-03-25T08:30:00Z | 10 | 2022-03-25T08:15:00Z | 2022-03-25T08:45:00Z | c53fba2e-b82e-418c-9380-1e732be8fcb5 | 1 |
Element-A | 2022-03-25T08:45:00Z | 10 | 2022-03-25T08:15:00Z | 2022-03-25T08:45:00Z | c53fba2e-b82e-418c-9380-1e732be8fcb5 | 1 |
Element-B | 2022-03-25T09:15:00Z | 10 | 2022-03-25T09:15:00Z | 2022-03-25T09:45:00Z | 4ab89211-4378-45d3-8ac7-a570942e2807 | 1 |
Element-B | 2022-03-25T09:30:00Z | 10 | 2022-03-25T09:15:00Z | 2022-03-25T09:45:00Z | 4ab89211-4378-45d3-8ac7-a570942e2807 | 1 |
Element-B | 2022-03-25T09:45:00Z | 10 | 2022-03-25T09:15:00Z | 2022-03-25T09:45:00Z | 4ab89211-4378-45d3-8ac7-a570942e2807 | 1 |