Search code examples
sql-server-2008group-bycountpercentagecorrelated-subquery

SQL: Select count and percentage with conditions


I have two tables

User

   id_user|INT
   company | varchar

and

Log

log_id|int
id_user|int

I need to return the company, the total number of users per company, and the percentage of users that have atleast 3 logs

I can run this query to get the company and counts

select company, count (*) as 'Count'
from user 
group by company

which returns this

Apple| 7
Google| 6

But I am having trouble figuring out how to then return an extra column that displays the percentage of those users that have at least 3 logs. For example, If there were 2 users who had more than 3 logs from Apple and one user from Google who had more than 3 logs, the answer would look like this:

Apple| 7| 29% (because 2/7=~29%)
Google| 6| 17% (because 1/7=~17%)

I figured this requires the use of windows function or some type of correlated subquery but I'm having issues accurately obtaining the correct percentage.

Any help would be greatly appreciated. (using SQL server 2008)


Solution

  • I was actually able to do this without using window functions, though there is probably a version which would use them. First, I aggregate the number of logs per user in a CTE. Then, I join the user table to this, using conditional aggregation to count the number of users per company having 3 logs or more.

    WITH cte AS (
        SELECT id_user, COUNT(*) AS cnt
        FROM Log
        GROUP BY id_user
    )
    
    SELECT
        u.company,
        COUNT(DISTINCT u.id_user) AS total_users,
        100.0 * SUM(CASE WHEN c.cnt >= 3 THEN 1 ELSE 0 END) /
            COUNT(DISTINCT u.id_user) AS log_3_users
    FROM [User] u
    LEFT JOIN cte c
        ON u.id_user = c.id_user
    GROUP BY
        u.company;
    

    enter image description here

    Demo

    Note that in the demo I just have some dummy data, where 1 out of 3 Google users has 3 or more logs, and 1 out of 2 Microsoft employees has 3 or more logs.