Is it possible to skip a paragraph using arrow::open_dataset in r?

I have 20 datasets, and some of them have introductions in the first few rows. Since not all the dataset have introduction and the number of rows of introductions from different datasets may not be the same, therefore skip_rows may not be useful. Is it possible to catch the keywords and start reading from the row that contains keywords?

Sample dataset:

dataset 1:

balabala	balabala...
A header	Another header
First	row
Second	row

dataset 2:

A header	Another header
First	row
Second	row

dataset 3:

|balabala | balabala... | |balabala | balabala... | | -------- | -------------- | | A header | Another header | | First | row | | Second | row |

etc...

What I want:

dataset 1:

A header	Another header
First	row
Second	row

dataset 2:

A header	Another header
First	row
Second	row

dataset 3:

A header	Another header
First	row
Second	row

etc...

Solution

You may try

library(dplyr)
library(janitor)

df1 <- read.table(text = "balabala  balabala...
'A header'  'Another header'
First   row
Second  row", header = T)

df2 <- read.table(text = "'A header'    'Another header'
First   row
Second  row", header = T, check.names = F)

df3 <- read.table(text = "balabala  balabala...
balabala    balabala...
'A header'  'Another header'
First   row 
Second  row", header = T)

header_vector <- c('A header', 'Another header')

ftn <- function(df){
  if (all(names(df) == header_vector)) {
    df
  } else {
    df$key = apply(df, 1, function(x) {all(x == header_vector)})
    df %>%
      mutate(key = cumsum(key)) %>%
      filter(key >= 1) %>% select(-key) %>%
      janitor::row_to_names(row_number = 1) 
  }
  
}

ftn(df1)
  A header Another header
2    First            row
3   Second            row


ftn(df2)
  A header Another header
1    First            row
2   Second            row

ftn(df3)
  A header Another header
2    First            row
3   Second            row