Search code examples
rduplicatesseq

Modifying seq_along with duplicates


Any seq Experts here?

I want to count the number of values per id. Counting from 1 to n in the correct order works fine. But duplicated values should be labeld with same number.

Any parameter in for seq that i am missing?

Repr. Ex.: Where "count_n" is the value I actually create, and "need" is the desired output.

Thank you in advance. Cheers

Date <- as.Date(c('2006-08-30','2006-08-30','2006-08-23', '2006-09-06', 
'2006-09-13', '2006-09-20'))
ID <- c("x1","x1","x1","X2","X3","x1")
need<- c(2,2,1,1,1,3)
df<-data.frame(ID,Date,need)

df<- df%>% arrange(Date)
df$count_n <- ave(as.numeric(df$Date),df$ID,FUN = seq_along) 
  ID       Date need count_n
1 x1 2006-08-23    1       1
2 x1 2006-08-30    2       2
3 x1 2006-08-30    2       3
4 X2 2006-09-06    1       1
5 X3 2006-09-13    1       1
6 x1 2006-09-20    3       4

Solution

  • We can convert Date to factor for each ID which will give you unique value for each Date

    ave(as.integer(df$Date),df$ID,FUN = factor)
    #[1] 1 2 2 1 1 3
    

    We can also use dense_rank with dplyr

    library(dplyr)
    df %>%
      group_by(ID) %>%
      mutate(count_n = dense_rank(Date))
    
    #  ID    Date        need count_n
    #  <fct> <date>     <dbl>   <int>
    #1 x1    2006-08-23     1       1
    #2 x1    2006-08-30     2       2
    #3 x1    2006-08-30     2       2
    #4 X2    2006-09-06     1       1
    #5 X3    2006-09-13     1       1
    #6 x1    2006-09-20     3       3