I have about 100,000 user IDs who are visiting n number of levels. I need to find the overall median of the users and levels visited by each.
I've tried to use AVG based on the number of IDs on each level and the total IDs who started the application. The values are varying alot.
To find IDs who started the application.
SELECT
event_names, COUNT(DISTINCT id) uniques, COUNT(id) AS total
FROM xyz.analytics_111.xyz
WHERE (date BETWEEN "20191018" AND "20191024") AND version = "3.1" AND event_names in ("app_open","internet") AND platform = "ANDROID"
AND id IN ( SELECT DISTINCT id FROM abc.analytics_111.abc WHERE event_names = "internet" AND internet_status = 1 )
GROUP BY event_names
To find total users on each level.
SELECT event_names, story_name, level, COUNT(DISTINCT id) uniques, COUNT(id) AS total
FROM xyz.analytics_111.xyz WHERE (date BETWEEN "20191018" AND "20191024") AND version = "3.1" AND event_names in ("start_level","end_level") AND platform = "ANDROID" AND id IN ( SELECT DISTINCT id FROM abc.analytics_111.abc
WHERE event_names = "internet" AND internet_status = 1 )
GROUP BY event_names, story_name, level ORDER BY event_names DESC, story_name, level
After this I'm dividing the Sum of User ID count on all levels by the number of User ID who started the application to get the AVG of Levels visited by each user. Is there a way to find a median?
The question doesn't have enough details for a complete answer, but with the elements you've given us:
AVG
, when you want MEDIAN
To calculate a median you can do something like:
SELECT level_id, fhoffa.x.median(ARRAY_AGG(some_number))
FROM `table`
GROUP BY level_id
bqutil.fn.median()
is a public UDF we shared with the world:
Now, with the extra details you provided, if you want
to get the AVG of Levels visited by each user
, then:
SELECT AVG(levels) avg_levels_for_users
FROM (
SELECT id user, COUNT(DISTINCT level) levels
FROM `....`
GROUP BY user
)