Given Merchant ID, Card Number and Date, I need to count the number of Card Numbers associated to each Merchant ID per month. So far I can only count the number of Card Numbers per Merchant ID but I cannot count it yet per Month using this code:
HitsCounter <- summarise(group_by(df, MerchantID), HitsTotal = n_distinct(CardNum))
Reproducible Data:
MerchantID <- c('1234, '1234', '1234', '1234')
CardNum <- c('1abc1', '1abc1', '2xyz2', '3ijk3')
Date <- c('2020-05-07', '2020-05-07', '2019-10-12', '2019-10-25')
df <- data.frame(MerchantID, CardNum, Date)
What I get when I merge HitsCounter to my dataframe df:
MerchantID CardNum Date HitsTotal
1234 1abc1 2020-05-07 3
1234 1abc1 2020-05-07 3
1234 2xyz2 2019-10-12 3
1234 3ijk3 2019-10-25 3
Expected Output Data:
MerchantID CardNum Date HitsPerMonth
1234 1abc1 2020-05-07 2
1234 1abc1 2020-02-07 1
1234 2xyz2 2020-05-12 2
1234 3ijk3 2019-10-25 1
1234 3ijk3 2019-10-01 1
Take note that the logic should completely disregard any duplicate CardNumber even with a different month since a "Hit" is counted per unique CardNumber.
To put it simply, it should answer the question:
How many Unique Card Numbers transacted with the same Merchant ID in the same month?
You can extract year-month value from date and count the unique CardNum
values for each MerchantID
and month
.
library(dplyr)
df %>%
mutate(Date = as.Date(Date),
month = format(Date, "%Y-%m")) %>%
group_by(MerchantID, month) %>%
mutate(HitsPerMonth = n_distinct(CardNum))
If a CardNum
has to be calculated only once for each MerchantID
irrespective of the month then you can do :
df %>%
mutate(Date = as.Date(Date),
month = format(Date, "%Y-%m")) %>%
group_by(MerchantID) %>%
mutate(CardNum = replace(CardNum, duplicated(CardNum), NA)) %>%
group_by(month, .add =TRUE) %>%
mutate(HitsPerMonth = n_distinct(na.omit(CardNum)))