Search code examples
google-sheetsgoogle-sheets-formulacountinggoogle-sheets-querygoogle-query-language

Build a table with total unique values for each group in Google Spreadsheet


I am using google spreadsheets.

Imagine there are some families of insects with the following names in column A, and their genera in column B and C imaginary names for species (they are unique FOR each genus).

What I need is to have for EACH FAMILY (column A) the UNIQUE number of genus. I'll give the desired result after this table 1.

NOTE: Culicidae has 31 rows

Table 1- Original table

   A           B               C 
    ---------------------------------
1   Culicidae   Aedes           X
2   Culicidae   Aedes           Y
    Culicidae   Aedes           Z
    Culicidae   Aedes           W
    Culicidae   Aedes           E  
    Culicidae   Anopheles       X
    Culicidae   Anopheles       Y
    Culicidae   Anopheles       Z
    Culicidae   Anopheles       C
    Culicidae   Anopheles       V
    Culicidae   Anopheles       I
    Culicidae   Anopheles       P
    Culicidae   Anopheles       L
    Culicidae   Anopheles       U
    Culicidae   Coquillettidia  A
    Culicidae   Culex           X
    Culicidae   Culex           Y 
    Culicidae   Culex           Z
    Culicidae   Culex           V
    Culicidae   Culex           B
    Culicidae   Culex           N
    Culicidae   Culex           M
    Culicidae   Culex           O
    Culicidae   Culex           P
    Culicidae   Culiseta        A 
    Culicidae   Culiseta        B
    Culicidae   Culiseta        C
    Culicidae   Culiseta        T
    Culicidae   Culiseta        Y
    Culicidae   Orthopodomyia   W 
    Culicidae   Uranotaenia     A
32  Tephritidae Tephritis       X

Table 2. Desired result

Family            Total Number of Unique Genera
--------------------------------------------------
Culicidae                 7
Tephritidae               1

Culicidae has 7 unique values for column B, hence the final desired result in another new table (table 2 in another different sheet) should be 7 for this case.

So we need first to find the range for Culicidae (it is dynamic - today the values for Culicidae can range from A1 to A31, and tomorrow from A100 to A130 or whatever, and can increase or decrease with the time, ie, today has 31 rows, tomorrow can have 90 rows or 29 rows...). Ideally, it should group each family automatically in column A for the TABLE 2.

Then for Culicidae it should find out how many unique genera are in column B, THEY ARE 7 unique values. And that's it. I hope this clears out.

I know that to find the unique values should be something like (but this range is static.. and not dynamic as desired...):

=SUMPRODUCT(1/countif($B$1:$B$31,$B$1:$B$31))

The main difficulty is to achieve the right range ($B$1:$B$31 or where it is the data for Culicidae) IN only one formula (that can be dynamic, and to group each family as shown in Table 2).


Solution

  • use formula:

    =QUERY(SORTN(A1:B, 99^99, 2, 2, 1), 
     "select Col1,count(Col1) 
      where Col1 is not null
      group by Col1
      label count(Col1)''")
    

    enter image description here