I'm working in Impala and, while I'm fairly inexperienced in both Impala and SQL, I need to be able to build a data set that looks like the following:
|dayname | 2017-11-08 00:00:00 | 2017-11-08 01:00:00 | ... |
|---------|---------------------+---------------------+-----|
|Wednesday| 20 | 11 | ... |
|---------|---------------------|---------------------|-----|
|Thursday | 287 | 17 | ... |
|---------|---------------------|---------------------|-----|
|... | ... | ... | ... |
|---------|---------------------|---------------------|-----|
I am unable, due to the constraints of Impala, to use pivot, which would under normal circumstances produce the desired result.
Thus far, I have a SQL SELECT statement which looks like this:
select
dayname(date) as dayname,
utc_hour,
sum(case when (`type` IN ('Awesome')) then 1 else 0 end) as some
FROM (select *, trunc(cast(floor(date / 1000) as timestamp), "HH") as utc_hour
FROM COOLNESSTYPES
WHERE date >= 1510082633596 and month >= '2017-11'
) a
GROUP BY utc_hour, dayname
ORDER BY utc_hour;
and returns the following data:
+-----------+---------------------+-------+
| dayname | utc_hour | some |
+-----------+---------------------+-------+
| Wednesday | 2017-11-08 00:00:00 | 20 |
| Wednesday | 2017-11-08 01:00:00 | 11 |
| Wednesday | 2017-11-08 09:00:00 | 1 |
| Wednesday | 2017-11-08 11:00:00 | 40 |
| Wednesday | 2017-11-08 12:00:00 | 0 |
| Wednesday | 2017-11-08 13:00:00 | 6 |
| Wednesday | 2017-11-08 14:00:00 | 0 |
| Wednesday | 2017-11-08 16:00:00 | 2 |
| Wednesday | 2017-11-08 17:00:00 | 10 |
| Wednesday | 2017-11-08 19:00:00 | 5 |
| Thursday | 2017-11-09 07:00:00 | 1 |
| Thursday | 2017-11-09 12:00:00 | 0 |
| Thursday | 2017-11-09 13:00:00 | 0 |
| Thursday | 2017-11-09 14:00:00 | 58 |
| Friday | 2017-11-10 09:00:00 | 0 |
| Friday | 2017-11-10 10:00:00 | 0 |
| Friday | 2017-11-10 16:00:00 | 0 |
+-----------+---------------------+-------+
So, how do I go about doing something like this? On Cloudera's community pages, someone recommends using unions, but I'm not really clear on how I'd label my columns as the row values from my utc_hour column. (see https://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Transpose-columns-to-rows/td-p/49667 for more information on the union suggestion, if needed.)
Any help or ideas on this would be greatly appreciated. Thanks!
There is added complexity if you really require column names that change. If you can tolerate fixed column names the pivot is simple, along these lines:
select
dayname
, extract(dow from utc_hour) d_of_w
, max(case when date_part('day', utc_hour) = 0 then somecol end) hour_0
, max(case when date_part('day', utc_hour) = 7 then somecol end) hour_7
, max(case when date_part('day', utc_hour) = 9 then somecol end) hour_9
, max(case when date_part('day', utc_hour) = 12 then somecol end) hour_12
, max(case when date_part('day', utc_hour) = 14 then somecol end) hour_14
from COOLNESSTYPES
group by
d_of_w
, dayname
I used Postgres to develop my example for this example using extract(hour from utc_hour)
instead of the date_part()
now shown above (thanks to hbomb).
| dayname | d_of_w | hour_0 | hour_7 | hour_9 | hour_12 | hour_14 |
|-----------|--------|--------|--------|--------|---------|---------|
| Wednesday | 3 | 20 | (null) | 1 | 0 | 0 |
| Friday | 5 | (null) | (null) | 0 | (null) | (null) |
| Thursday | 4 | (null) | 1 | (null) | 0 | 58 |
see: http://sqlfiddle.com/#!17/81cfd/2 (Postgres)
To achieve column names that change you need "dynamic sql" and to be frank it isn't clear to be if this is possible in Impala (as I don't use that product).