I am trying to count the total values for a specific column for three tables in Impala/Hive, but I can only seem to get the total values for each table. For example, I am receiving the count for Poland for each table instead of the count for Poland for all three tables combined. I have tried to union the tables together, but I have been unsuccessful. Listed below is the coding I have used.
SELECT table1.country, COUNT(*)
FROM table1
GROUP BY table1.country
UNION
SELECT table2.country, COUNT(*)
FROM table2
GROUP BY table2.country
UNION
SELECT table3.country, COUNT(*)
FROM table3
GROUP BY table3.country
ORDER BY COUNT(country) DESC;
Use UNION ALL
instead of UNION
:
SELECT table1.country, COUNT(*)
FROM table1
GROUP BY table1.country
UNION ALL
SELECT table2.country, COUNT(*)
FROM table2
GROUP BY table2.country
UNION ALL
SELECT table3.country, COUNT(*)
FROM table3
GROUP BY table3.country
ORDER BY COUNT(country) DESC;
UNION
removes duplicates, so if two tables have the same count for a country, then the duplicate is removed.
EDIT:
If you want one row per country, use a subquery and reaggregate:
SELECT country, SUM(cnt)
FROM (SELECT table1.country, COUNT(*) as cnt
FROM table1
GROUP BY table1.country
UNION ALL
SELECT table2.country, COUNT(*)
FROM table2
GROUP BY table2.country
UNION ALL
SELECT table3.country, COUNT(*)
FROM table3
GROUP BY table3.country
) t
GROUP BY country;