I have three columns "Name", "success Dummy" and "Date". For each NAME I want to check the PAST SUCCESS for that NAME.
So for example if name "Peter" has occurred three times, for each time I want to count the number of "Peter" with "Success ==1" and Date happened before.
Example of the output that I need to get for "Past Success" column.
Name Success Date Past Success
David 1 2018 1
Peter 0 2017 3
Peter 1 2016 2
David 1 2017 0
Peter 1 2015 1
Peter 0 2010 1
Peter 1 2005 0
Peter NA 2004 0
Is there any way to do it fast?
Also I need it to be very fast because my data is huge.
What I did is I sorted my data based on Names and Dates and check each observation against 100 observations before (because max of frequency of Names is 100).
Please advise if there is a better way to do that.
Here are two ways to do it. One of them is almost as @FALL Gora, but the other is from base R
# these two steps are assuming you have data.table
# modify them accordingly if you have data.frame
data <- data[order(Name, Date)]
data[is.na(Success), Success := 0]
### tapply
data$past_success <- unlist(with(data, tapply(Success, Name, cumsum)))
### data.table
data[, past_success_dt := cumsum(Success), by = Name]
data
Name Success Date past_success past_success_dt
1: David 1 2017 1 1
2: David 1 2018 2 2
3: Peter 0 2004 0 0
4: Peter 1 2005 1 1
5: Peter 0 2010 1 1
6: Peter 1 2015 2 2
7: Peter 1 2016 3 3
8: Peter 0 2017 3 3