I have a csv file that looks like this
Header, Car, LapTime
1, Spa, 0
0, Ferrari, 2.1
0, Aston Martin, 2.3
1, Monza, 0
0, Ferrari, 2.5
0, Ford, 2.6
0, Aston Martin, 2.4
That is, the csv contains multiple subtables stacked below each other, where the header of each subtable has header = 1
. I don't know how many rows each sub-table has and how many sub-tables there are.
Now I would like to import this csv (preferably as a tidyverse tbl) to have it in R represented as
Car, LapTime, Circuit
Ferrari, 2.1, Spa
Aston Martin, 2.3, Spa
Ferrari, 2.5, Monza
Ford, 2.6, Monza
Aston Martin, 2.4, Monza
I cannot come up with a programmatic way to do this. Do you have an idea?
Read the file (see Note at end to reproduce it) and grouping by cumsum(Header)
add the Circuit
column, remove the first row of the group and remove the Header
column.
library(dplyr)
library(readr)
"cars.csv" %>%
read_csv %>%
group_by(Header = cumsum(Header)) %>%
mutate(Circuit = first(Car)) %>%
slice_tail(n = -1) %>%
ungroup %>%
select(-Header)
giving
# A tibble: 5 × 3
Car LapTime Circuit
<chr> <dbl> <chr>
1 Ferrari 2.1 Spa
2 Aston Martin 2.3 Spa
3 Ferrari 2.5 Monza
4 Ford 2.6 Monza
5 Aston Martin 2.4 Monza
Lines <- "Header, Car, LapTime
1, Spa, 0
0, Ferrari, 2.1
0, Aston Martin, 2.3
1, Monza, 0
0, Ferrari, 2.5
0, Ford, 2.6
0, Aston Martin, 2.4
"
cat(Lines, file = "cars.csv")