I'm a novice in the apply functions and thanks for the help in advance. I have a dataset(df) and I only need to clean a subset of rows in column x- the rows that have a hyphen will be cleaned. I have included column x_clean in df as this is what I expect to get from cleaning the column. If there is a hyphen in any of the values of column x, I will pad the string before the hyphen with 0s until it has 5 digits, and the string after the hyphen with 0s until it has 4 digits. And if there is no hyphen in the string, then I will set it to NA. This is what I have tried and hasn't worked yet:
df=data.frame(x=c("55555555","4444-444","NULL","hello","0065440006123","22-111"))%>%
mutate(nchar=nchar(x),
detect=str_detect(x,"-"),
xlcean=c(NA,"04444-0444",NA,NA,NA,"00022-0111"))
df%>%mutate(xclean=sapply(strsplit(x,"-"), function(x)
{ifelse(detect==T,
paste(sprintf("%05d",as.numeric(x[1])), sprintf("%04d",as.numeric(x[2])), sep="-"),NA)}))
I have also tried this as well:
df%>%mutate(x_clean=
if (detect==T) {sapply(strsplit(x,"-"), function(x)paste(sprintf("%05d",as.numeric(x[1])), sprintf("%04d",as.numeric(x[2])), sep="-"))}
else {NA})
An approach with dplyr
, without sapply
library(dplyr)
df %>%
rowwise() %>%
mutate(xclean = strsplit(x, "-"),
xclean = ifelse(grepl("-", x), sprintf("%05d%s%04d",
as.integer(xclean[1]), "-", as.integer(xclean[2])), NA)) %>%
ungroup()
# A tibble: 6 × 2
x xclean
<chr> <chr>
1 55555555 NA
2 4444-444 04444-0444
3 NULL NA
4 hello NA
5 0065440006123 NA
6 22-111 00022-0111
Just sapply
data.frame(df, xclean = sapply(strsplit(df$x, "-"), function(y)
ifelse(length(y) == 2,
sprintf("%05d%s%04d", as.integer(y[1]), "-", as.integer(y[2])), NA)))
x xclean
1 55555555 <NA>
2 4444-444 04444-0444
3 NULL <NA>
4 hello <NA>
5 0065440006123 <NA>
6 22-111 00022-0111