@Thorsten Kettner mentioned I didn't say what the problem is, so I wanted to add that. Right now the result is ordered by countries alphabetically, but I need it to be ordered by value of "ALL" for each country.
I've been playing with BigQuery public data sets, i.e. bigquery-public-data.geo_international_ports.world_port_index.
I'm selecting 'country' and 'harbor_size' columns and I'm calculating total count of country. I'm adding this as new row with value 'ALL' in column 'harbor_size'. I want to sort on this total count but in a way, where all country elements stay together. Original 'harbor_size' elements: ['S', 'M', 'L', 'V'].
Here's what I have so far: enter image description here
Query:
WITH
total_n AS
(
SELECT
country,
"ALL" AS harbor_size,
COUNT(country) as hb_count
FROM bigquery-public-data.geo_international_ports.world_port_index
GROUP BY country
ORDER BY country ASC, hb_count DESC
)
SELECT * FROM total_n
UNION ALL
SELECT
country,
harbor_size,
COUNT(country) as hb_count
FROM bigquery-public-data.geo_international_ports.world_port_index
GROUP BY country, harbor_size
ORDER BY country ASC, hb_count DESC;
So in the end, I'd like to have something like this:
country | h_size | count | remark |
---|---|---|---|
US | ALL | 10,000 | <- sort on this |
US | L | 7,000 | |
US | M | 2,000 | |
US | S | 1,000 | |
CA | ALL | 9,000 | <- sort on this |
CA | L | 6,000 | |
CA | M | 2,000 | |
CA | S | 1,000 | |
... | ... | ... | ... |
I was considering having temp table with only 'ALL' in 'harbor_size' then adding index after sorting, so that each country would have an index, then unioning this with the other table (country, harbor_size w/ all elements, count) and sorting on the index.
Your query sorts the countries alphabetically and even has the problem that countries having only one harbour size get their 'ALL' row and size row in arbitrary order, as they have the same count.
Instead you want to order the countries by the total harbour count descending and then by country and 'ALL' followed by the country's harbor sizes.
The ORDER BY
clause in your CTE is superfluous by the way, because subquery results are unordered data sets. The DBMS is free to ignore this clause altogether.
You can get the total with GROUPING SETS, as you want to group by country and horbour size and also by country. This makes the query simpler. To get the countries ordered by their total count, use a window function (SUM(COUNT(*)) OVER (PARTITION BY country)
).
SELECT
country,
COALESCE(harbor_size, 'ALL') AS hb_size,
COUNT(*) as hb_count
FROM bigquery-public-data.geo_international_ports.world_port_index
GROUP BY GROUPING SETS ( (country, harbor_size), (country) )
ORDER BY
SUM(COUNT(*)) OVER (PARTITION BY country) DESC, -- countries with many harbours first
country ASC, -- get countries with the same harbour count in alphabetical order
hb_count DESC, -- within each country get the harbour sizes ordered by their count
hb_size ASC; -- and get 'ALL' before the harbour size row, for countries with just one size