Search code examples
sqldatabasegoogle-bigqueryaggregate-functionswindow-functions

How to calculate average omitting duplicates per partition in SQL?


I want to calculate the average item count accounting for sub-partitions in each partition.

Sample Data:

id     session      item_count    random_field_1 
1      weoifn2      3             A
1      weoifn2      3             B
1      iuboiwe      2             K
2      oeino33      5             R
2      vergeeg      8             C
2      feooinn      9             P
2      feooinn      9             M

Logic:

  • id = 1: (3 + 2) / 2 = 2.5
  • id = 2: (5 + 8 + 9) / 3 = 7.33

Expected Output:

id      avg
1       2.5
2       7.33

My Query:

SELECT 
   id
 , AVG(item_count) OVER (PARTITION BY id) AS avg
FROM my_table

However, I believe this will factor in duplicates twice, which is unintended. How can I fix my query to only consider one item_count value per session?


Solution

  • Consider below approach

    select id, avg(item_count) as avg
    from (
      select distinct id, session, item_count
      from your_table
    )
    group by id           
    

    if applied to sample data in your question - output is

    enter image description here