Search code examples
google-sheetsgroup-bygoogle-sheets-formulabinning

Google sheets binning and group by (custom time interval)


I wish to count events that occurred within a custom time interval : it could be within 24h, or within a week or 2-months span.

I am using google sheets: I can create a pivot table and group by month, however I'd like to explore insights using custom intervals (I'm looking for pattern in epilepsy).

As final result, I wanna have a table that, for each day, it is reported the number of frequencies within that interval.

Particularly, I wanna focus on the interval of 24h to count the number of events of epilepsy (known as cluster seizures).

And then, on custom days intervals to explore periodicity or trends - like each 48 hours, or each 15 or 30 days.

See a mockup of Google Sheet here:

https://docs.google.com/spreadsheets/d/1tCxYV5mUcq6vKm8-fL-0HUAOjcB9fipLCqPD2Znv-X0/edit#gid=1372548551


I tried this attempts:

  1. find out how many events occurred in the last 30 days prior to the reported date:
= IFERROR(
  QUERY(                                                
     A:E,                                             
     "SELECT COUNT(A)                                   
      WHERE 
        A IS NOT NULL AND 
        E = FALSE AND 
        A >= date '" & 
          TEXT(
            A2-30,                                    
            "yyyy-MM-dd"                                
          ) &"' AND
        A <= date '" & 
          TEXT(
            A2,                                    
            "yyyy-MM-dd"                                
          ) &"'

    LABEL COUNT(A) '' "), "N/A")

Then, dragging the cell, I get the column "# events in the prior 30 days".

It works but seems a bit messy - especially for updating the intervals.

  1. I tried this other approach:
=query(B:E, "select B, count(E), -1+count(E) where E = FALSE group by B label B 'Date with Clusters', count(E) 'Cluster seizures '")

That produces the last table.

I like this approach better, but here I am just grouping by the same date, without possibility to have a custom interval.

As an example, I will have that two events will be counted within the same day, not withing the same 24h interval.

Could you tell a better approach to handle datetime differences, so to create binning and group by with custom intervals ?

Below an example: on the left table, data in input; on the middle column, result of first approach; on the right table, results of second approach.

enter image description here


Solution

  • given the table:

    in order to group stuff with QUERY we need to "fix" the A column in order to get a custom period. lets say we need to group events every 3 weeks (21 days). we take the lowest and highest date and create a sequence with all the dates in between.

    =INDEX(ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A))))
    

    then we use running total on it to get every date which is 21 days apart from the previous/next one. we could use simple SEQUENCE (for min>max) to create this array but with SEQUENCE we cant go "back in time" (for max>min) so we use MMULT and negative number

    therefore, to start from a frame of the first date and create 3 weeks group by windows (eg. min>max) we use:

    =ARRAYFORMULA({MIN(A2:A); MIN(A2:A)+MMULT(TRANSPOSE((
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*21); SiGN(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))})
    

    and to get a reverse of it and start from frame of end date and create 3 weeks windows backwards (eg. max>min) we use:

    =ARRAYFORMULA({MAX(A2:A); MAX(A2:A)+MMULT(TRANSPOSE((
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*-21); SiGN(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))})
    

    enter image description here

    at this stage, we can start fixing the A column via VLOOKUP and 4th argument set to 1 - approximate mode (instead of 0 - exact match mode) so forward in time will be:

    =ARRAYFORMULA(IFNA(VLOOKUP(A2:A; SORT({MIN(A2:A); MIN(A2:A)+MMULT(TRANSPOSE((
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*21); SIGN(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))}); 1; 1)))
    

    and backward in time shall be:

    =ARRAYFORMULA(IFNA(VLOOKUP(A2:A; SORT({MAX(A2:A); MAX(A2:A)+MMULT(TRANSPOSE((
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*-21); SIGN(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))}); 1; 1)))
    

    enter image description here

    and now we just create a virtual array {} and pair fixed column A with column C and input it as range into QUERY

    side note:

    to put columns next to each other in english spreadsheets we use ,

    to put columns next to each other in non-english spreadsheets we use \

    =ARRAYFORMULA(QUERY({IFNA(VLOOKUP(A2:A; SORT({MIN(A2:A); MIN(A2:A)+MMULT(TRANSPOSE((
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*21); SIGN(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))}); 1; 1))\ C2:C}; 
     "select Col1,count(Col1) 
      where Col2 = FALSE 
      group by Col1 
      order by count(Col1) desc 
      label count(Col1)''"))
    

    and backwards in time:

    =ARRAYFORMULA(QUERY({IFNA(VLOOKUP(A2:A; SORT({MAX(A2:A); MAX(A2:A)+MMULT(TRANSPOSE((
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))<=TRANSPOSE(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))*-21); SIGN(
     ROW(INDIRECT(MIN(A2:A)&":"&MAX(A2:A)))))}); 1; 1))\ C2:C}; 
     "select Col1,count(Col1) 
      where Col2 = FALSE 
      group by Col1 
      order by count(Col1) desc 
      label count(Col1)''"))
    

    enter image description here

    demo spreadsheet