Search code examples
rreadxl

How to read rectangle surrounded by whitespace using readxl


I have an excel sheet with the following data which I am trying to read into r:

enter image description here

I would like to read in the table at A3:C6. However, the dimensions of this data may change. I want to define it as the data starting at A3 in the top left, stopping at whitespace to the right, and whitespace at the bottom.

I am currently using read_xlsx(skip = 2) but this also reads in row 8, which I do not want.


Solution

  • The following code seems to do what the question asks for. It's based on the fact that readxl::read_xlsx returns NA's for empty cells.

    df1 <- readxl::read_xlsx(xl_filename, skip = 2)
    
    i_col <- which(sapply(df1, function(x) all(is.na(x))))
    i_col <- i_col[which(i_col == min(i_col))]
    i_row <- which(apply(df1, 1, function(x) all(is.na(x))))
    i_row <- i_row[which(i_row == min(i_row))]
    
    df1[seq_len(i_row)[-i_row], seq_len(i_col)[-i_col]]