Search code examples
rdataframetidyrdata-processing

How to make an index column based on already existing column having a specific pattern in r?


I have a column name set, in a dataframe df which looks like

df <- data.frame(set <- c("","","","","","set","","","","","set","","","","","set"))

now I want a column set_sequence based on pattern from column set which should look like:

df <- data.frame(set <- c("","","","","","set","","","","","set","","","","","set"),
                 set_seq <- c("","","","","","1","","","","","2","","","","","3"))

Can anyone help me how can I do that in r I tried cumsum function from data.table but didn't help. Also I have empty cells not NA cells


Solution

  • I'm not sure about exact condition, but just for data you provided,

    library(tidyverse)
    dat <-  data.frame(set = c("","","","","","set","","","","","set","","","","","set"))
    dat %>%
      group_by(set) %>%
      mutate(set_seq = ifelse(set == "",  "", as.character(1:n())))
    
       set   set_seq
       <chr> <chr>  
     1 ""    ""     
     2 ""    ""     
     3 ""    ""     
     4 ""    ""     
     5 ""    ""     
     6 "set" "1"    
     7 ""    ""     
     8 ""    ""     
     9 ""    ""     
    10 ""    ""     
    11 "set" "2"    
    12 ""    ""     
    13 ""    ""     
    14 ""    ""     
    15 ""    ""     
    16 "set" "3" 
    

    if you see this result as data.frame,

       set set_seq
    1             
    2             
    3             
    4             
    5             
    6  set       1
    7             
    8             
    9             
    10            
    11 set       2
    12            
    13            
    14            
    15            
    16 set       3