I have a very similar question what I already had last weeks ago: Get all time unique values in BigQuery
I have a database like this:
ID | Day | Value |
---|---|---|
1 | 2021-09-01 | a |
2 | 2021-09-01 | b |
3 | 2021-09-01 | c |
4 | 2021-09-02 | d |
5 | 2021-09-02 | a |
6 | 2021-09-02 | a |
7 | 2021-09-02 | e |
8 | 2021-09-03 | c |
9 | 2021-09-03 | f |
10 | 2021-09-03 | a |
I'd like to count how many different rows I have daily and all time, but the all time uniqueness should count only with the date before (the business logic behind that I'd like to count if the user is new). The difference with the question before that I'd like to leave the rows but I'd like to see the uniqueness by rows (as a new column). It's almost the same what we have on Google Analytics as a new or returning user. So if a user get the site on 2021-09-02 and get to the site on 2021-09-03, first I'd like to see as New user but on 2021-09-03 I'd like to see a Returning user. so I'd like to see this output
ID | Day | Value | Type |
---|---|---|---|
1 | 2021-09-01 | a | New |
2 | 2021-09-01 | b | New |
3 | 2021-09-01 | c | New |
4 | 2021-09-02 | d | New |
5 | 2021-09-02 | a | Returning |
6 | 2021-09-02 | a | Returning |
7 | 2021-09-02 | e | Returning |
8 | 2021-09-03 | c | New |
9 | 2021-09-03 | f | New |
10 | 2021-09-03 | a | Returning |
I can do it if I only check it on one day but I can't do it if I check these on the whole database because of the date before checking.
So it looks you want to use an analytics functions detailed in this doc
Using the analytics functions OVER
with PARTITION BY
you can partition your data via value then order it by date using ORDER BY
. Now check to see if its the first row in that partition and assign the type accordingly.
This query should get you what you want;
WITH data as(
SELECT "2021-09-01" day,"a" value
UNION ALL ( SELECT "2021-09-01", "b" )
UNION ALL ( SELECT "2021-09-01", "c" )
UNION ALL ( SELECT "2021-09-02", "d" )
UNION ALL ( SELECT "2021-09-02", "a" )
UNION ALL ( SELECT "2021-09-02", "a" )
UNION ALL ( SELECT "2021-09-02", "e" )
UNION ALL ( SELECT "2021-09-03", "c" )
UNION ALL ( SELECT "2021-09-03", "f" )
UNION ALL ( SELECT "2021-09-03", "a" )
)
SELECT day, value,
IF(ROW_NUMBER() OVER (PARTITION BY value ORDER BY day) = 1, 'New','Returning') as type
FROM data
Results
Row | day | value | type |
---|---|---|---|
1 | 2021-09-01 | a | New |
2 | 2021-09-02 | a | Returning |
3 | 2021-09-02 | a | Returning |
4 | 2021-09-03 | a | Returning |
5 | 2021-09-01 | b | New |
6 | 2021-09-01 | c | New |
7 | 2021-09-03 | c | Returning |
8 | 2021-09-02 | d | New |
9 | 2021-09-02 | e | New |
10 | 2021-09-03 | f | New |
To give all grouped values
with the same date the as the first event the New
type you can use another analytics functions FIRST_VALUE
and combine against the current date value.
WITH data as
(SELECT "2021-09-01" day,"a" value
UNION ALL ( SELECT "2021-09-01","b")
UNION ALL ( SELECT "2021-09-01","c")
UNION ALL ( SELECT "2021-09-02","d")
UNION ALL ( SELECT "2021-09-02","a")
UNION ALL ( SELECT "2021-09-01","a")
UNION ALL ( SELECT "2021-09-02","a")
UNION ALL ( SELECT "2021-09-02","e")
UNION ALL ( SELECT "2021-09-03","c")
UNION ALL ( SELECT "2021-09-03","f")
UNION ALL ( SELECT"2021-09-03","a"))
SELECT *,
IF(ROW_NUMBER() OVER (PARTITION BY value ORDER BY day) = 1 OR FIRST_VALUE(day) OVER (PARTITION BY value ORDER BY day) = day, 'New','Returning') as type
FROM data
Result
Row | day | value | type |
---|---|---|---|
1 | 2021-09-01 | a | New |
2 | 2021-09-01 | a | New |
3 | 2021-09-02 | a | Returning |
4 | 2021-09-02 | a | Returning |
5 | 2021-09-03 | a | Returning |
6 | 2021-09-01 | b | New |
7 | 2021-09-01 | c | New |
8 | 2021-09-03 | c | Returning |
9 | 2021-09-02 | d | New |
10 | 2021-09-02 | e | New |
11 | 2021-09-03 | f | New |