Search code examples
rdplyrtidyrdata-manipulationtidy

rbind in a list in r (text to column)


I have 3 .csv files in one directory. The data sets have headers and text; text; text; format.

`list_data <- list("Red; Green", 
                  "Green; Blue",
                  "Blue; Yellow")
print(list_data)`

and I would like to merge them into 1 .csv based on the same headers.

  files  <- list.files(pattern = '\\.csv')
tables <- lapply(files, read.csv, header = TRUE)
combined.df <- do.call(rbind , tables)
str(combined.df)

How to get around this? many thanks in advance.


Solution

  • Making some assumptions about the real data, I'll change the sample data slightly:

    list_data <- list(file1="Red; Green\n1;2", file2="Green; Blue\n3;4", file3="Blue; Yellow\n5;6")
    

    I added names to enable the option to preserve file name in the rows of data (far below).

    # tables <- lapply(setNames(nm = files), read.csv2) # what you call with a file list
    tables <- lapply(list_data, function(z) read.csv2(text = z))
    str(tables)
    # List of 3
    #  $ file1:'data.frame':    1 obs. of  2 variables:
    #   ..$ Red  : int 1
    #   ..$ Green: int 2
    #  $ file2:'data.frame':    1 obs. of  2 variables:
    #   ..$ Green: int 3
    #   ..$ Blue : int 4
    #  $ file3:'data.frame':    1 obs. of  2 variables:
    #   ..$ Blue  : int 5
    #   ..$ Yellow: int 6
    dplyr::bind_rows(tables)
    #   Red Green Blue Yellow
    # 1   1     2   NA     NA
    # 2  NA     3    4     NA
    # 3  NA    NA    5      6
    data.table::rbindlist(tables, fill = TRUE, use.name = TRUE)
    #      Red Green  Blue Yellow
    #    <int> <int> <int>  <int>
    # 1:     1     2    NA     NA
    # 2:    NA     3     4     NA
    # 3:    NA    NA     5      6
    

    If you want to preserve the file name with each row, then

    dplyr::bind_rows(tables, .id = "filename")
    #   filename Red Green Blue Yellow
    # 1    file1   1     2   NA     NA
    # 2    file2  NA     3    4     NA
    # 3    file3  NA    NA    5      6
    data.table::rbindlist(tables, fill = TRUE, use.name = TRUE, idcol = "filename")
    #    filename   Red Green  Blue Yellow
    #      <char> <int> <int> <int>  <int>
    # 1:    file1     1     2    NA     NA
    # 2:    file2    NA     3     4     NA
    # 3:    file3    NA    NA     5      6