I have 20 datasets, and some of them have introductions in the first few rows. Since not all the dataset have introduction and the number of rows of introductions from different datasets may not be the same, therefore skip_rows
may not be useful. Is it possible to catch the keywords and start reading from the row that contains keywords?
Sample dataset:
dataset 1:
balabala | balabala... |
---|---|
A header | Another header |
First | row |
Second | row |
dataset 2:
A header | Another header |
---|---|
First | row |
Second | row |
dataset 3:
|balabala | balabala... | |balabala | balabala... | | -------- | -------------- | | A header | Another header | | First | row | | Second | row |
etc...
What I want:
dataset 1:
A header | Another header |
---|---|
First | row |
Second | row |
dataset 2:
A header | Another header |
---|---|
First | row |
Second | row |
dataset 3:
A header | Another header |
---|---|
First | row |
Second | row |
etc...
You may try
library(dplyr)
library(janitor)
df1 <- read.table(text = "balabala balabala...
'A header' 'Another header'
First row
Second row", header = T)
df2 <- read.table(text = "'A header' 'Another header'
First row
Second row", header = T, check.names = F)
df3 <- read.table(text = "balabala balabala...
balabala balabala...
'A header' 'Another header'
First row
Second row", header = T)
header_vector <- c('A header', 'Another header')
ftn <- function(df){
if (all(names(df) == header_vector)) {
df
} else {
df$key = apply(df, 1, function(x) {all(x == header_vector)})
df %>%
mutate(key = cumsum(key)) %>%
filter(key >= 1) %>% select(-key) %>%
janitor::row_to_names(row_number = 1)
}
}
ftn(df1)
A header Another header
2 First row
3 Second row
ftn(df2)
A header Another header
1 First row
2 Second row
ftn(df3)
A header Another header
2 First row
3 Second row