Search code examples
rxlsx

Import online excel file with two headers


I am trying to download directly one supplementary material to R and I am having trouble. First, I tried:

datatable = fread("https://doi.org/10.1371/journal.pone.0242866.s001") 

The first problem was:

Error in fread("https://doi.org/10.1371/journal.pone.0242866.s001") : 
  embedded nul in string: '\xf7\xc5B1\xfcL\xf8xu~\xad\016\001\xddL5(\xb2I`\x8bҶ\005tL\xeePd\x97"{\024\xd9'p@C\031R\x9b#\x8a\034S\x9b\023\002\xa7\004Ψ\xcd9E.(rI\x91+\001\xddM~Mi7\x94vK\xe0\x8e\xd2&\t\x85&)\xc6\xd6\021U\xd6\a\023\x96X\177\033F\xde'J|\x9fY\x99\037L\030ǫ\xdc\017\xefe\xe1x\x91\xfd\x89\xd2?l\x84q6T\0\xc1\x84qD*\001\037\xab\032\xf0(\x8a Q\025\xf8X\x95A0a\xec\xb3'
In addition: Warning message:
In fread("https://doi.org/10.1371/journal.pone.0242866.s001") :
  Detected 1 column names but the data has 3 columns (i.e. invalid file). Added 2 extra default column names at the end.

I tried

datatable = data.table(read.csv("https://doi.org/10.1371/journal.pone.0242866.s001"))

However, the result brought me only one variable.

So I tried

datatable = data.table(read.csv2("https://doi.org/10.1371/journal.pone.0242866.s001"))

Again, the problem persisted but the number of observations

When I tried to use read_excel and add a skip to see if I can exclude the first line, it appeared an error in the path.

datatable = data.table(read_excel("https://doi.org/10.1371/journal.pone.0242866.s001"), skip = 1)

Error: `path` does not exist: ‘https://doi.org/10.1371/journal.pone.0242866.s001’

Can some help me?


Solution

  • You can use openxlsx and tail():

    library(openxlsx)
    
    datatable <- openxlsx::read.xlsx(
      'https://doi.org/10.1371/journal.pone.0242866.s001', sheet = 1
    ) %>%
    tail(-1)