Search code examples
rif-statementsapply

sapply function with an ifelse condition


I'm a novice in the apply functions and thanks for the help in advance. I have a dataset(df) and I only need to clean a subset of rows in column x- the rows that have a hyphen will be cleaned. I have included column x_clean in df as this is what I expect to get from cleaning the column. If there is a hyphen in any of the values of column x, I will pad the string before the hyphen with 0s until it has 5 digits, and the string after the hyphen with 0s until it has 4 digits. And if there is no hyphen in the string, then I will set it to NA. This is what I have tried and hasn't worked yet:

df=data.frame(x=c("55555555","4444-444","NULL","hello","0065440006123","22-111"))%>%
  mutate(nchar=nchar(x), 
         detect=str_detect(x,"-"),
         xlcean=c(NA,"04444-0444",NA,NA,NA,"00022-0111"))
df%>%mutate(xclean=sapply(strsplit(x,"-"), function(x)
  {ifelse(detect==T,
    paste(sprintf("%05d",as.numeric(x[1])), sprintf("%04d",as.numeric(x[2])), sep="-"),NA)}))

I have also tried this as well:

df%>%mutate(x_clean=
             if (detect==T) {sapply(strsplit(x,"-"), function(x)paste(sprintf("%05d",as.numeric(x[1])), sprintf("%04d",as.numeric(x[2])), sep="-"))}
              else {NA})

Solution

  • An approach with dplyr, without sapply

    library(dplyr)
    
    df %>% 
      rowwise() %>% 
      mutate(xclean = strsplit(x, "-"),
             xclean = ifelse(grepl("-", x), sprintf("%05d%s%04d", 
               as.integer(xclean[1]), "-", as.integer(xclean[2])), NA)) %>% 
      ungroup()
    # A tibble: 6 × 2
      x             xclean    
      <chr>         <chr>     
    1 55555555      NA        
    2 4444-444      04444-0444
    3 NULL          NA        
    4 hello         NA        
    5 0065440006123 NA        
    6 22-111        00022-0111
    

    Just sapply

    data.frame(df, xclean = sapply(strsplit(df$x, "-"), function(y) 
      ifelse(length(y) == 2, 
        sprintf("%05d%s%04d", as.integer(y[1]), "-", as.integer(y[2])), NA)))
                  x     xclean
    1      55555555       <NA>
    2      4444-444 04444-0444
    3          NULL       <NA>
    4         hello       <NA>
    5 0065440006123       <NA>
    6        22-111 00022-0111