Search code examples
rcountfrequency

R: counting frequency of a categorical variable (conditional on date )


I have three columns "Name", "success Dummy" and "Date". For each NAME I want to check the PAST SUCCESS for that NAME.

So for example if name "Peter" has occurred three times, for each time I want to count the number of "Peter" with "Success ==1" and Date happened before.

Example of the output that I need to get for "Past Success" column.

Name    Success Date   Past Success
David      1    2018    1
Peter      0    2017    3
Peter      1    2016    2
David      1    2017    0
Peter      1    2015    1
Peter      0    2010    1
Peter      1    2005    0
Peter     NA    2004    0

Is there any way to do it fast?

Also I need it to be very fast because my data is huge.

What I did is I sorted my data based on Names and Dates and check each observation against 100 observations before (because max of frequency of Names is 100).

Please advise if there is a better way to do that.


Solution

  • Here are two ways to do it. One of them is almost as @FALL Gora, but the other is from base R

    # these two steps are assuming you have data.table
    # modify them accordingly if you have data.frame
    data <- data[order(Name, Date)]
    data[is.na(Success), Success := 0]
    
    ### tapply
    data$past_success <- unlist(with(data, tapply(Success, Name, cumsum)))
    
    ### data.table
    data[, past_success_dt := cumsum(Success), by = Name]
    
    data
    
        Name Success Date past_success past_success_dt
    1: David       1 2017            1               1
    2: David       1 2018            2               2
    3: Peter       0 2004            0               0
    4: Peter       1 2005            1               1
    5: Peter       0 2010            1               1
    6: Peter       1 2015            2               2
    7: Peter       1 2016            3               3
    8: Peter       0 2017            3               3