Search code examples
rtidyversedelimiter

R separate_wider ignore comma inside quotes


I have data that I had to export from Airtable into a CSV file. There are some columns where multiple pieces of information are included, separated by commas. However, in some instances the separate pieces of information also include a comma, so that information is contained within quotes, like this:

df <- data.frame(str=c("A,B,C","D,E",'H,"I,J"'))

initial df

I'm trying to separate the pieces, by comma, into separate columns...but for row 3 I need "I, J" to stay together in the 2nd separated column (ideally, I'd like to remove the quotes after the split). However, when I use the following these two pieces are in separate columns:

df_splt <- df %>% separate_wider_delim(str, delim = ",", names = c(paste0("str_", seq_len(max(str_count(df$str, ",") + 1)))), too_few="align_start")

Yielding: enter image description here

I see there is a separate_wider_regex, but I'm unclear how that would work given I have varying numbers of commas within str.


Solution

  • You can use read.csv() which will only parse the commas that are not inside quotes, i.e.

    setNames(read.csv(text = df$str, header = FALSE, stringsAsFactors = FALSE), 
             paste0("str_", seq_len(max(stringr::str_count(df$str, ",") + 1))))
    
      str_1 str_2 str_3
    1     A     B     C
    2     D     E      
    3     H   I,J