Search code examples
rregexdataframesplitstrsplit

split rows of a column then make a column with the 2nd element R


This is hard for me, so please help me with this. I have a df that look like:

    col1      col2    col3
ccd_x29807 Gly_GCC_89 0.3
ccd_x29807 Gly_GCC_87 0.3
ccd_x29807 Gly_GCC_88 0.3
ccd_x20463 Lys_CTT_12 0.1

What I want to do is to save the values (after x) in a new column. So the output should look like:

    col1      col2   col3 col4
ccd_x29807 Gly_GCC_89 0.3 29807
ccd_x29807 Gly_GCC_87 0.3 29807
ccd_x29807 Gly_GCC_88 0.3 29807
ccd_x20463 Lys_CTT_12 0.1 20463

I tried this but it puts 29807 in all of the rows:

df1$col1 %>% 
  mutate(col4 = str_split(samples, "x")[[1]][2])'

Solution

  • You can use separate from the tidyr package.

    library(tidyr)
    
    df <- data.frame(
      col1 = c("ccd_x29807", "ccd_x29807", "ccd_x29807", "ccd_x20463"),
      col2 = c("Gly_GCC_89", "Gly_GCC_87", "Gly_GCC_88", "Lys_CTT_12"),
      col3 = c(0.3, 0.3, 0.3, 0.1)
    )
    
    df %>%
      mutate(col_temp = col1) %>%
      separate("col_temp", into = c(NA, "col4"), sep = "x")
    

    Output:

            col1       col2 col3  col4
    1 ccd_x29807 Gly_GCC_89  0.3 29807
    2 ccd_x29807 Gly_GCC_87  0.3 29807
    3 ccd_x29807 Gly_GCC_88  0.3 29807
    4 ccd_x20463 Lys_CTT_12  0.1 20463