SQL Server 2008 R2
I have a table called Actions, this would be a snippet of what it looks like
ActionID | ActionType | ActionUserID | ActionDateTime
---------+------------+--------------+---------------------
555363 Open 9843 2020-09-15 09:27:55
555364 Process 2563 2020-09-15 09:31:22
555365 Close 8522 2020-09-15 09:37:48
555366 Detour 9843 2020-09-15 09:42:42
555367 Process 9843 2020-09-15 09:51:50
555368 Close 8522 2020-09-15 09:55:45
555369 Open 1685 2020-09-15 09:57:12
555370 Detour 2563 2020-09-15 10:03:23
555371 Detour 9843 2020-09-15 10:04:33
555372 Close 8522 2020-09-15 10:07:44
The table has hundreds of thousands of rows. What I want to do is review 1% of all actions performed by each user for a specific month.
I know I can get 1% of everything by doing:
SELECT TOP 1 PERCENT *
FROM Actions
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
I know I can get 1% of a specific user by doing:
SELECT TOP 1 PERCENT *
FROM Actions
WHERE ActionUserID = 9843
AND ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
ORDER BY NEWID()
But what I really want to get is 1% of each user. I know I could get a list of the users who performed actions during the month by doing:
SELECT DISTINCT(ActionUserID)
WHERE ActionDateTime BETWEEN '09/01/2020' AND '09/30/2020'
However I am not sure how to combine these two queries.
But what I really want to get is 1% of each user.
I would recommend window function percent_rank()
:
select *
from (
select a.*, percent_rank() over(partition by actionuserid order by newid()) prn
from actions a
where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where prn < 0.01
If your version of SQL Server is so old that it does not support percent_rank()
, then we can emulate it with rank()
and count()
:
select *
from (
select a.*,
rank() over(partition by actionuserid order by newid()) as rn,
count(*) over(partition by actionuserid) as cnt
from actions a
where actiondatetime >= '20200901' and actiondatetime < '20201001'
) a
where 100.0 * rn / cnt < 1 or (rn = 1 and cnt < 100)