Search code examples
google-sheetsgoogle-sheets-query

Google Sheets: select top N cases within groups using QUERY


I want to make a report: "select salesmen who have N recent Status codes = 2,3,4,5".

A structural example of my data has 35 rows (including 1 header row): link. This file has a Date, Sales code (id of a salesman), Status code (id of how successful a transaction was) and other fields which are not necessary for the purpose.

I ended up using three formulas:

  1. a QUERY function with IMPORTRANGE.
    In the example data it is slightly simpler - take only Sales Codes, Status Code, and Date from another sheet, then order by Date, Sales Code. Formula in A9:
    =QUERY({Source!D1:E, Source!A1:A}, $B$4, 1)

  2. an additional column with a sequentional numbering. Formula in D10:
    =ArrayFormula(if(len(A10:A), ROW(A10:A) - MATCH(A10:A,A10:A,0) - 8, ))

  3. a QUERY function to extract only N cases (let's say 5). Formula in F9:
    =QUERY(A9:D, "select A, B where D <="&B3, 1)

Is there a way to combine all 3 steps into one so I get the output like in F10:G24 using one (and hopefully fast :)) formula? The formula I tried (expanded for readability):

=QUERY(
    {
      QUERY({Source!D1:E, Source!A1:A}, $B$4, 1),
      ArrayFormula(
         IF(len(J10:J),
            ROW(J10:J) - MATCH(J10:J, J10:J, 0) - 8,
         )
      )
    },
    "select Col1, Col2 where Col4 <="&B3,
    1
)

It gives me the error:

"Function ARRAY_ROW parameter 2 has mismatched row size. Expected: 28. Actual: 991."

I also tried finite data ranges in ROW() and MATCH() but that yields an empty table. In the original database there are ~3500 rows and they will expand, so I think I should stick to infinite ranges to automate data extraction.


Solution

  • Hm well it's a bit of a nightmare TBH - a similar question has cropped up before but no easy answer. Here is a draft which involves repeating the basic query several times-

    =ArrayFormula(query({query(A2:E,"select * where E>=2 and E<=5  order by D, A desc"),
    row(indirect("2:"&count(filter(E2:E,E2:E>=2,E2:E<=5))+1))-
    match(query(A2:E,"select D where E>=2 and E<=5  order by D, A desc"),
    query(A2:E,"select D where E>=2 and E<=5  order by D, A desc"),0)
    },"select * where Col6<=5"))
    

    But needs looking at more to see if it can be simplified.

    This is the full-column version including headers - I think it's OK

    =ArrayFormula(query({query(A:E,"select * where E>=2 and E<=5  order by D, A desc"),
    row(indirect("1:"&count(filter(E:E,E:E>=2,E:E<=5))+1))+1-
    match(query(A:E,"select D where E>=2 and E<=5  order by D, A desc"),
    query(A:E,"select D where E>=2 and E<=5  order by D, A desc"),0)
    },"select * where Col6<=5"))
    

    enter image description here