I'm trying to create a query which does something a little complex, and I haven't been able to find anything that might point me in the right direction. Perhaps YOU can help!
This is the source data:
7457, "05:06:26 UTC", 15
7457, "05:06:26 UTC", 15
7457, "05:06:26 UTC", 15
7457, "05:06:26 UTC", 15
2341, "05:12:34 UTC", 10
2341, "05:12:34 UTC", 10
2341, "05:12:34 UTC", 10
2341, "05:12:34 UTC", 10
5678, "05:12:34 UTC", 15
5678, "05:12:34 UTC", 15
5678, "05:12:34 UTC", 15
5678, "05:12:34 UTC", 15
5678, "05:12:34 UTC", 15
5678, "05:12:34 UTC", 15
5678, "05:12:34 UTC", 15
5678, "05:12:34 UTC", 15
5678, "05:12:39 UTC", 15
5678, "05:12:39 UTC", 15
1111, "06:00:00 UTC", 10
2222, "07:00:00 UTC", 15
3333, "08:00:00 UTC", 10
I have a query to find duplicated stats:
SELECT userID, timestamp, statType, COUNT(*) - 1 AS DuplCount
FROM [dataset1.table1]
GROUP BY userID, timestamp, statType
HAVING DuplCount > 0;
(Note that only stats that have the same userID and timestamp can be considered duplicates.)
This results in a table that looks like
userID timestamp statType DuplCount
7457 05:06:26 UTC 15 3
2341 05:12:34 UTC 10 3
5678 05:12:34 UTC 15 7
5678 05:12:39 UTC 15 1
I want to further consolidate this data so it can inserted as one row in another table: the sum of the counts of duplicates of the same statType. I want it to look something like
table stat10DuplCount stat15DuplCount
dataset1.table1 3 11
I'm not sure how to proceed... can this all be done in one query (preferred), or do I need to do multiple queries or do some post-query data processing?
I have figured out how to do exactly what I want; the only difference between this query and Felipe's is that it takes the sum of the duplicates instead of counting each set of duplicates as one occurrence.
SELECT "dataset1.table1" table, SUM(IF(statID=10,DuplCount,null)) stat10DuplCount, SUM(IF(statID=15,DuplCount,null)) stat15DuplCount,
FROM (
SELECT userID, timestamp, statType, COUNT(*) - 1 AS DuplCount
FROM [dataset1.table1] AS statsTable
GROUP BY userID, timestamp, statType
HAVING DuplCount > 0
);
Which results in:
table stat10DuplCount stat15DuplCount
dataset1.table1 3 11