Search code examples
rcsvtibblereadr

Tables stacked below each other in csv - How do I import them in R?


I have a csv file that looks like this

Header, Car, LapTime
1, Spa, 0
0, Ferrari, 2.1
0, Aston Martin, 2.3
1, Monza, 0
0, Ferrari, 2.5
0, Ford, 2.6
0, Aston Martin, 2.4

That is, the csv contains multiple subtables stacked below each other, where the header of each subtable has header = 1. I don't know how many rows each sub-table has and how many sub-tables there are.

Now I would like to import this csv (preferably as a tidyverse tbl) to have it in R represented as

Car, LapTime, Circuit
Ferrari, 2.1, Spa
Aston Martin, 2.3, Spa
Ferrari, 2.5, Monza
Ford, 2.6, Monza
Aston Martin, 2.4, Monza

I cannot come up with a programmatic way to do this. Do you have an idea?


Solution

  • Read the file (see Note at end to reproduce it) and grouping by cumsum(Header) add the Circuit column, remove the first row of the group and remove the Header column.

    library(dplyr)
    library(readr)
    
    "cars.csv" %>%
      read_csv %>%
      group_by(Header = cumsum(Header)) %>%
      mutate(Circuit = first(Car)) %>%
      slice_tail(n = -1) %>%
      ungroup %>%
      select(-Header)
    

    giving

    # A tibble: 5 × 3
      Car          LapTime Circuit
      <chr>          <dbl> <chr>  
    1 Ferrari          2.1 Spa    
    2 Aston Martin     2.3 Spa    
    3 Ferrari          2.5 Monza  
    4 Ford             2.6 Monza  
    5 Aston Martin     2.4 Monza  
    

    Note

    Lines <- "Header, Car, LapTime
    1, Spa, 0
    0, Ferrari, 2.1
    0, Aston Martin, 2.3
    1, Monza, 0
    0, Ferrari, 2.5
    0, Ford, 2.6
    0, Aston Martin, 2.4
    "
    cat(Lines, file = "cars.csv")