I'm working with Hive for the first time and it doesn't support pivot. So I'm trying to either do a series of self-joins or column expressions to get what need. I'm running into errors.
My first table (Employees) looks like this. There's four categorical groups in the role column that I'd like to turn into columns using group bys. One for C-Suite, one for D-Suite, etc. with the alias of the employees as the values.
alias | name | role | oversight | functional_area |
---|---|---|---|---|
jbob | Jo Bob | C-Suite | Finance | Orders |
jndo | Jon Doe | D-Suite | Finance | Orders |
jndo | Jon Doe | D-Suite | E-Commerce | Orders |
tweb | Tim Webb | SME | Analytics | Fraud |
kpow | Kat Power | C-Suite | Controls | Architecture |
lguan | Lin Guan | D-Suite | Sales | Marketing |
phar | Pat Harms | Manager | HR | HR |
This is what I'm working with:
SELECT
U.alias
U.name
U.role
U.oversight
U.functional_are
FROM employee U
LEFT OUTER JOIN
(SELECT
c.alias, c.oversight
FROM employee c
WHERE role = 'C-Suite' AND oversight <> 'N/A'
GROUP BY oversight, alias)
ON U.alias = c.alias
And I'm getting error messages on the join when I try and just self-join c-suite. I also get an error when I try and put c.alias as c_suite in the select statement and not do a subquery. In R this is done with gather and spread, but I'm getting confused with Hive syntax.
For anyone else coming across this question, I also have a piece of proprietary software in addition to Hive that I'm using to make extracts and it cherrypicks SQL syntax. Some functions work, some don't. CASE worked for me. Not the cleanest query, but it works like pivot. I'm able to make joins after from statement.
SELECT
CASE WHEN U.role = 'C-Suite' THEN U.alias
ELSE 'NULL' END AS "C_Alias",
CASE WHEN U.role = 'D-Suite' THEN U.alias
ELSE 'NULL' END AS "D_Alias",
CASE WHEN U.role = 'SME' THEN U.alias
ELSE 'NULL' END AS "SME_Alias",
CASE WHEN U.role = 'Manager' THEN U.alias
ELSE 'NULL' END AS "Manager_Alias",
U.role
,U.name
,U.oversight
,U.functional_area
FROM employees U