Search code examples
google-sheetsgoogle-query-languagegoogle-sheets-query

Google Sheets Query Group By / First-N-Per-Group


I'm trying to find a simple solution for first-n-per-group.

I have a table of data, first column dates and rest data. I want to group based around the date, as multiple entries per date are allowed. For the second column some numbers, but want the FIRST record.

Currently the aggregate function I could possibly use is MIN() but that will return the lowest value and not the first.

A            B
01/01/2018   10
01/01/2018   15
02/01/2018   10
02/01/2018   2
02/01/2018   100
02/01/2018   20
03/01/2018   5
03/01/2018   2

Desired output

A            B
01/01/2018   10
02/01/2018   10
03/01/2018   5

Current results using MIN() - undesired

A            B
01/01/2018   10
02/01/2018   2
03/01/2018   2

It's a shame there isn't a FIRST() aggregate function in Google Sheets, which would make this a lot easier.

I saw a couple of examples of using the Row Number and ArrayQuery, but that doesn't seem to work for me. There are about 5000 rows of data so trying to keep this as efficient as possible, and not have to recalculate the entire sheet on any change, each taking a few seconds.

Currently I have this, which appends a third column with the Row Number:

=query({A1:B, arrayformula(row(A1:B))}, "select min(Col1),min(Col2) group by Col1")

Thanks


EDIT 1


A suggested solution was =SORTN(A:B,2^99,2,1,1), which is a clean simple one. However, this requires a large range of "free space" to display the returned dataset. Imagine 3000+ rows.

I was hoping for a QUERY() -based solution, as I wanted to do further operations with the results. Specifically, count the occurrences of distinct values.

For example: I wanted a returned dataset of

A            B
01/01/2018   10
02/01/2018   10
03/01/2018   5

Yet I want to count the occurrences of those values (and then ignoring the dates). For example:

B    C
10   2
5    1

Perhaps I've confused the situation by using numbers? the "data" in ColB is TEXT (short 3 letter codes), however I used numbers to show I couldn't use MIN() function as that returns the numerically lowest value.

So in brief:

  • Go through all rows (3000+ rows) and group by the FIRST row of a particular date
  • return the FIRST value of that row
  • COUNT() all unique occurrences of those FIRST values, disregarding the date. Just a list with the unique values and their count (again, only the first one of any particular day)

Solution

  • =SORTN(A:B,2^99,2,1,1)
    

    If your data is sorted as in the sample, You can easily remove duplicates with SORTN()