Search code examples
google-sheetsgoogle-sheets-formulagoogle-sheets-query

QUERY to merge the names of two regions and sum the number of their occurrences


I have an array with states/regions in the UK. Some regions occur more than once in this list, so I have performed a COUNTIF to determine the number of times that each one occur.

Now I need to run a QUERY to list top 5 regions.

Generally, most occurrences are for the London area.

The issue is that in the regions there are 2 states that refer to the Greater London area - London and Greater London.

These two I need to merge and sum their values. There needs to be only one region - Greater London, and its value needs to hold the sum of London and Greater London.

This is the dataset I have:

+----------------+-------+
| State/Province | count |
+----------------+-------+
| Hampshire      | 1     |
+----------------+-------+
| Kent           | 2     |
+----------------+-------+
| West Lothian   | 3     |
+----------------+-------+
| London         | 4     |
+----------------+-------+
| Greater London | 5     |
+----------------+-------+
| Cheshire       | 6     |
+----------------+-------+

I have managed to put together this QUERY so far:

=QUERY(A1:B,"select A, max(B) group by A order by max(B) desc limit 5 label max(B) 'Number of occurrences'",1)

That gives me this output:

+----------------+-----------------------+
| State/Province | Number of occurrences |
+----------------+-----------------------+
| Cheshire       | 6                     |
+----------------+-----------------------+
| Greater London | 5                     |
+----------------+-----------------------+
| London         | 4                     |
+----------------+-----------------------+
| West Lothian   | 3                     |
+----------------+-----------------------+
| Kent           | 2                     |
+----------------+-----------------------+

What I need is the Greater London and London entries to be merged under the name Greater London and their numbers of occurrences to be summed, providing this result:

+----------------+-----------------------+
| State/Province | Number of occurrences |
+----------------+-----------------------+
| Greater London | 9                     |
+----------------+-----------------------+
| Cheshire       | 6                     |
+----------------+-----------------------+
| West Lothian   | 3                     |
+----------------+-----------------------+
| Kent           | 2                     |
+----------------+-----------------------+
| Hampshire      | 1                     |
+----------------+-----------------------+

Apologies for not sharing a sheet, but I have security restrictions that are not allowing me to share any link to sheet outside the firm.


Solution

  • =QUERY(QUERY(ARRAYFORMULA(
     {SUBSTITUTE(IF(A1:A="London","♥",A1:A),"♥","Greater London"),B1:B}),
     "select Col1, sum(Col2) 
      where Col1 is not null 
      group by Col1"),
     "select Col1, max(Col2) 
      group by Col1 
      order by max(Col2) desc 
      limit 5 
      label max(Col2)'Number of occurrences'",1)
    

    0