I need to find the percentage of logs in my table that are duplicated. Therefore I did a query with a "having" that checks if the key was duplicated. The problem is that after doing this "having" I lost all the logs that were not duplicated.
Here is the table:
Here is my query:
(SELECT count(params_advertiserId) AS duplicates
FROM android_clicks
GROUP BY params_advertiserId ,app_id ,date --my key is a triplet
HAVING COUNT(params_advertiserId) > 1)
Help would be appreciated.
It this what you want?
select (count(*) - count(distinct params_advertiserId, app_id, date)) / count(*) as duplicate_ratio
from android_clicks ac;
Your query is incorrect because AND
is used for boolean expressions. So the result of the GROUP BY
expression is true, false, or NULL
.
If you want the count, then wrap it as a subquery:
SELECT COUNT(*) as num_duplicates
FROM (SELECT params_advertiserId, app_id, date AS duplicates
FROM android_clicks ac
GROUP BY params_advertiserId, app_id, date
HAVING COUNT(*) > 1
);