Search code examples
runique

How to collect unique values, and sum across other columns with conditions


I have a lot of financial trading data with around a million rows and I want to be able to condense this into a new data frame with a list of Unique UserIDs. I then want to be able to add up the "trades" for their account, with some conditions, ie if TransactionTypeId == 2 & AC_Type== 19. I would use a sumifs in excel for this but the size of the file means its pretty much impossible to run on my computer.

df<- structure(list(UserId = c(1, 1, 1, 1, 2, 
2, 2, 3, 3, 3, 4, 5, 6, 
6, 6, 7, 7, 7, 8, 8, 8, 
8, 8, 9, 9, 9, 10, 11, 12, 
12, 13, 13, 13, 14, 14, 15, 15, 
16, 16, 16), TransactionTypeId = c(14, 1, 1, 70, 
15, 1, 1, 14, 1, 1, 70, 14, 14, 1, 1, 14, 1, 1, 14, 1, 1, 1, 
1, 14, 1, 1, 14, 14, 1, 1, 14, 1, 1, 1, 1, 70, 70, 14, 1, 1), 
    AC_Type = c(21, 21, 21, 21, 19, 19, 19, 19, 19, 19, 19, 19, 
    19, 19, 19, 21, 21, 21, 19, 19, 19, 19, 19, 19, 19, 19, 20, 
    19, 19, 19, 20, 20, 20, 20, 20, 20, 20, 20, 20, 20), Trades = c(30, 
    30, 0.00067116, 0.00067115, 249, 249, 0.00533033, 48.75, 
    48.75, 0.00101298, 0.00533, 24.37, 146.25, 146.25, 0.00309109, 
    100.01, 100.01, 0.00233551, 97.5, 90, 0.00189134, 5, 0.00245851, 
    234, 234, 0.00500802, 100.01, 48.75, 48.5, 0.0275474, 24, 
    24, 0.00051975, 100, 0.00223998, 0.00051975, 0.00205, 9.75, 
    8.75, 0.00017811)), row.names = c(NA, -40L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution

  • You can take sum of the logical condition that you want to count.

    library(dplyr)
    
    df %>%
      group_by(UserId) %>%
      summarise(count = sum(Trades[TransactionTypeId == 2 & AC_Type== 19]))