Search code examples
rdataframelapplytidyr

Apply tidyr::separate over multiple columns


I would like to iterate over columns in a dataframe and split them into the based on a separator. I am using tidyr::separate, which works when I do one column at a time.

For example:

df<- data.frame(a = c("5312,2020,1212"), b = c("345,982,284"))

df <- separate(data = df, col = "a", 
                         into = paste("a", c("col1", "col2", "col3"), 
                                      sep = "_"), sep = ",")

Returns:

  a_col1 a_col2 a_col3           b
1   5312   2020   1212 345,982,284

When I try to execute the same operation over each column of df R returns an error

For example I used this for loop:

for(col in names(df)){
    df <- separate(data = df, col = col, 
into = paste(col, c("col1", "col2", "col3), 
sep = "_"), sep = ",")
    }

I was expecting to get the following output:

  a_col1 a_col2 a_col3 b_col1 b_col2 b_col3
1   5312   2020   1212    345    982    284

However R returns this error:

Error in if (!after) c(values, x) else if (after >= lengx) c(x, values) else c(x[1L:after],  : 
  argument is of length zero

Is there another way to apply tidyr::separate over multiple columns in a data frame?


Solution

  • You could feed a customized separate_() call into Reduce().

    sep <- function(...) {
        dots <- list(...)
        n <- stringr::str_count(dots[[1]][[dots[[2]]]], "\\d+")
        separate_(..., into = sprintf("%s_col%d", dots[[2]], 1:n))
    }
    
    df %>% Reduce(f = sep, x = c("a", "b"))
    #   a_col_1 a_col_2 a_col_3 b_col_1 b_col_2 b_col_3
    # 1    5312    2020    1212     345     982     284
    

    Otherwise, cSplit will do it too.

    splitstackshape::cSplit(df, names(df))
    #     a_1  a_2  a_3 b_1 b_2 b_3
    # 1: 5312 2020 1212 345 982 284