For an excel file with multiple row of headers as such (test data for downloading from here):
How could I skip rows Unit
and Frequency
and use indicator_name
as header of excel file while reading excel with R?
With code below, it seems I could only skip one row by setting skip parameter an integer.
library(readxl)
myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE))
myDF <- read_excel("./test123.xlsx", skip = 2, col_names = myCols)
Reference:
You just need to skip = 3
instead of 2, as you need to skip the header when you read in the data. Since we have already defined column names in myCols
, then we do not need to keep the column name row when you read it in.
library(readxl)
myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE))
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)
Output
indicator_name M2 GDP
<dttm> <dbl> <dbl>
1 2018-01-01 00:00:00 6.71 8.17
2 2018-01-02 00:00:00 6.79 8.19
3 2018-01-03 00:00:00 6.77 8.21
4 2018-01-04 00:00:00 6.73 8.20
5 2018-01-05 00:00:00 6.67 8.20
6 2018-01-06 00:00:00 6.62 8.21
7 2018-01-07 00:00:00 6.62 8.21
8 2018-01-08 00:00:00 6.64 8.22
9 2018-01-09 00:00:00 6.64 8.22
If you have the first column name blank, then you could replace the NA
in the column names before reading in the data.
library(tidyverse)
myCols <- read_excel("./test123.xlsx", n_max = 2, col_names = FALSE) %>%
slice(1) %>%
mutate(across(everything(), ~replace_na(., "indicator_name"))) %>%
as.character()
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)
Benchmark
In this instant, it looks like it would still be faster to just filter out the rows after reading them in.
bm <- microbenchmark::microbenchmark(filter_before = {myCols <- as.character(read_excel("./test123.xlsx", n_max = 1, col_names = FALSE));
myDF <- read_excel("./test123.xlsx", skip = 3, col_names = myCols)},
filter_after = {myDF2 <- read_excel("./test123.xlsx");
myDF2 <- myDF2[-c(1:2),]},
times = 1000)
autoplot(bm)