Suppose I have the following table table1
in Amazon Athena (technically, in Glue):
torg, foo, bar, x1, x2, x3, baz
-------------------------------
t1, 1, NULL, NULL, NULL, NULL, goober
t2, 1, NULL, NULL, NULL, NULL, NULL -- want to ignore this row entirely, even though foo=1
t3, 1, tronic, NULL, NULL, NULL, NULL
...
t4, 2, NULL, NULL, NULL, NULL, horse
t4, 2, zebra, NULL, NULL, NULL, NULL
...
Is there an Athena query that can group by
foo
(in this example foo=1
), but ignoring NULL
column values for each row, such that a result would look like?
foo, bar, baz
----------------
1, tronic, goober
For context, I am writing the as partitioned parquet to S3 at different times. It's wonderful using Athena to query this data, but I'm having a hard time envisioning if Athena is the right place to "collapse" this columnar data, into more "row-wise" data grouped by a particular column (or set of columns).
While I will likely further work with this data in pandas, and could "collapse" it there, I was hoping there might be a query pattern in Athena. Or perhaps I'm thinking about this entirely wrong...
Any insight greatly appreciated.
If you have only one value per group then you can use max
(or min
) and group by
:
WITH dataset(foo,bar, baz) AS (
VALUES
(1,NULL,'tronic'),
(1,'goober',NULL),
(1,NULL, NULL)
)
SELECT foo, max(bar) bar, max(baz) baz
FROM dataset
GROUP BY foo
Output:
foo | bar | baz |
---|---|---|
1 | goober | tronic |