I have a database of transactions, people, transaction dates, items, etc. Each time a person buys an item, the transaction is stored in the table like so:
personNumber, TransactionNumber, TransactionDate, ItemNumber
What I want to do is to find people (personNumber) who, from January 1st 2012(transactionDate) until March 1st 2012 have purchased the same ItemNumber multiple times within 14 days (configurable) or less. I then need to list all those transactions on a report.
Sample data:
personNumber, TransactionNumber, TransactionDate, ItemNumber
1 | 100| 2001-01-31| 200
2 | 101| 2001-02-01| 206
2 | 102| 2001-02-11| 300
1 | 103| 2001-02-09| 200
3 | 104| 2001-01-01| 001
1 | 105| 2001-02-10| 200
3 | 106| 2001-01-03| 001
1 | 107| 2001-02-28| 200
Results:
personNumber, TransactionNumber, TransactionDate, ItemNumber
1 | 100| 2001-01-31| 200
1 | 103| 2001-02-09| 200
1 | 105| 2001-02-10| 200
3 | 104| 2001-01-01| 001
3 | 106| 2001-01-03| 001
How would you go about doing that?
I've tried doing it like so:
select *
from (
select personNumber, transactionNumber, transactionDate, itemNumber,
count(*) over (
partition by personNumber, itemNumber) as boughtSame)
from transactions
where transactionDate between '2001-01-01' and '2001-03-01')t
where boughtSame > 1
and it gets me this:
personNumber, TransactionNumber, TransactionDate, ItemNumber
1 | 100| 2001-01-31| 200
1 | 103| 2001-02-09| 200
1 | 105| 2001-02-10| 200
1 | 107| 2001-02-28| 200
3 | 104| 2001-01-01| 001
3 | 106| 2001-01-03| 001
The issue is that I don't want TransactionNumber 107, since that's not within the 14 days. I'm not sure where to put in that limit of 14 days. I could do a datediff, but where, and over what?
Alas, the window functions in SQL Server 2005 just are not quite powerful enough. I would solve this using a correlated subquery.
The correlated subquery counts the number of times that a person purchased the item within 14 days after each purchase (and not counting the first purchase).
select t.*
from (select t.*,
(select count(*)
from t t2
where t2.personnumber = t.personnumber and
t2.itemnumber = t.itemnumber and
t2.transactionnumber <> t.transactionnumber and
t2.transactiondate >= t.transactiondate and
t2.transactiondate < DATEADD(day, 14, t.transactiondate
) NumWithin14Days
from transactions t
where transactionDate between '2001-01-01' and '2001-03-01'
) t
where NumWithin14Days > 0
You may want to put the time limit in the subquery as well.
An index on transactions(personnumber, itemnumber, transactionnumber, itemdate)
might help this run much faster.