Search code examples
mysqlsum

MYSQL sum() for distinct rows


I'm looking for help using sum() in my SQL query:

SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as conversions, 
       sum(conversions.value) as conversion_value 
FROM links 
LEFT OUTER JOIN stats ON links.id = stats.parent_id 
LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
GROUP BY links.id 
ORDER BY links.created desc;

I use DISTINCT because I'm doing "group by" and this ensures the same row is not counted more than once.

The problem is that SUM(conversions.value) counts the "value" for each row more than once (due to the group by)

I basically want to do SUM(conversions.value) for each DISTINCT conversions.id.

Is that possible?


Solution

  • I may be wrong but from what I understand

    • conversions.id is the primary key of your table conversions
    • stats.id is the primary key of your table stats

    Thus for each conversions.id you have at most one links.id impacted.

    You request is a bit like doing the cartesian product of 2 sets :

    [clicks]
    SELECT *
    FROM links 
    LEFT OUTER JOIN stats ON links.id = stats.parent_id 
    
    [conversions]
    SELECT *
    FROM links 
    LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
    

    and for each link, you get sizeof([clicks]) x sizeof([conversions]) lines

    As you noted the number of unique conversions in your request can be obtained via a

    count(distinct conversions.id) = sizeof([conversions])
    

    this distinct manages to remove all the [clicks] lines in the cartesian product

    but clearly

    sum(conversions.value) = sum([conversions].value) * sizeof([clicks])
    

    In your case, since

    count(*) = sizeof([clicks]) x sizeof([conversions])
    count(*) = sizeof([clicks]) x count(distinct conversions.id)
    

    you have

    sizeof([clicks]) = count(*)/count(distinct conversions.id)
    

    so I would test your request with

    SELECT links.id, 
       count(DISTINCT stats.id) as clicks, 
       count(DISTINCT conversions.id) as conversions, 
       sum(conversions.value)*count(DISTINCT conversions.id)/count(*) as conversion_value 
    FROM links 
    LEFT OUTER JOIN stats ON links.id = stats.parent_id 
    LEFT OUTER JOIN conversions ON links.id = conversions.link_id 
    GROUP BY links.id 
    ORDER BY links.created desc;
    

    Keep me posted ! Jerome