Impala query - optimize a query to get the uniques for given key

I'm looking for ways to count unique users that have a specific pkey and also the count of unique users who didn't have that pkey.

Here is a sample table:

 userid |   pkey     | pvalue
------------------------------
   U1   |   x        | vx
   U1   |   y        | vy
   U1   |   z        | vz
   U2   |   y        | vy
   U3   |   z        | vz
   U4   |   null     | null

I get the expected results to get the unique users who has the pkey='y' and those who didn't using this query but turns out to be expensive:

WITH all_rows AS
  ( SELECT userid, 
           IF( pkey='y', pval, 'none' ) AS val,
           SUM( IF(pkey='y',1,0) ) AS has_key
   FROM some_table 
   GROUP BY userid, val)
SELECT val,
       count(distinct(userid)) uniqs
FROM all_rows
WHERE has_key=1
GROUP BY val
UNION ALL
SELECT 'no_key_set' val,
       count(distinct(userid)) uniqs
FROM all_rows a1 LEFT ANTI JOIN 
     all_rows a2 on (a1.userid = a2.userid and a2.has_key=1) 
GROUP BY val;

Results:

val        | uniqs
--------------------
vy         | 2
no_key_set | 2

I'm looking to avoid using any temp tables, so any better ways this can be achieved?

Thanks!

Solution

By using EXPLAIN, you can observe that most of the cost is spent on doing excessive GROUP BY aggregations rather than on using subqueries in your original query.

Here is a straightforward implementation

WITH t1 AS (
    SELECT pkey, COUNT(*) AS cnt
    FROM table
    WHERE pkey IS NOT NULL
    GROUP BY pkey
), t2 AS (
    SELECT COUNT(DISTINCT userid) AS total_cnt
    FROM table
)
SELECT
    CONCAT('no_', pkey) AS pkey,
    (total_cnt - cnt) AS cnt
FROM t1, t2
UNION ALL
SELECT * FROM t1

t1 gets a table of unique user count per pkey

+------+-----+
| pkey | cnt |
+------+-----+
| x    | 1   |
| z    | 2   |
| y    | 2   |
+------+-----+

t2 gets the number of total unique users

+-----------+
| total_cnt |
+-----------+
| 4         |
+-----------+

we can use the result from t2 to get the complement table of t1

+------+-----+
| pkey | cnt |
+------+-----+
| no_x | 3   |
| no_z | 2   |
| no_y | 2   |
+------+-----+

a final union of the two tables gives a result of

+------+-----+
| pkey | cnt |
+------+-----+
| no_x | 3   |
| no_z | 2   |
| no_y | 2   |
| x    | 1   |
| z    | 2   |
| y    | 2   |
+------+-----+