I'm looking for ways to count unique users that have a specific pkey and also the count of unique users who didn't have that pkey.
Here is a sample table:
userid | pkey | pvalue
------------------------------
U1 | x | vx
U1 | y | vy
U1 | z | vz
U2 | y | vy
U3 | z | vz
U4 | null | null
I get the expected results to get the unique users who has the pkey='y' and those who didn't using this query but turns out to be expensive:
WITH all_rows AS
( SELECT userid,
IF( pkey='y', pval, 'none' ) AS val,
SUM( IF(pkey='y',1,0) ) AS has_key
FROM some_table
GROUP BY userid, val)
SELECT val,
count(distinct(userid)) uniqs
FROM all_rows
WHERE has_key=1
GROUP BY val
UNION ALL
SELECT 'no_key_set' val,
count(distinct(userid)) uniqs
FROM all_rows a1 LEFT ANTI JOIN
all_rows a2 on (a1.userid = a2.userid and a2.has_key=1)
GROUP BY val;
Results:
val | uniqs
--------------------
vy | 2
no_key_set | 2
I'm looking to avoid using any temp tables, so any better ways this can be achieved?
Thanks!
By using EXPLAIN
, you can observe that most of the cost is spent on doing excessive GROUP BY
aggregations rather than on using subqueries in your original query.
Here is a straightforward implementation
WITH t1 AS (
SELECT pkey, COUNT(*) AS cnt
FROM table
WHERE pkey IS NOT NULL
GROUP BY pkey
), t2 AS (
SELECT COUNT(DISTINCT userid) AS total_cnt
FROM table
)
SELECT
CONCAT('no_', pkey) AS pkey,
(total_cnt - cnt) AS cnt
FROM t1, t2
UNION ALL
SELECT * FROM t1
t1
gets a table of unique user count per pkey
+------+-----+
| pkey | cnt |
+------+-----+
| x | 1 |
| z | 2 |
| y | 2 |
+------+-----+
t2
gets the number of total unique users
+-----------+
| total_cnt |
+-----------+
| 4 |
+-----------+
we can use the result from t2
to get the complement table of t1
+------+-----+
| pkey | cnt |
+------+-----+
| no_x | 3 |
| no_z | 2 |
| no_y | 2 |
+------+-----+
a final union of the two tables gives a result of
+------+-----+
| pkey | cnt |
+------+-----+
| no_x | 3 |
| no_z | 2 |
| no_y | 2 |
| x | 1 |
| z | 2 |
| y | 2 |
+------+-----+