Search code examples
rexcelurlhttrreadxl

Reading Excel file into R using readxl and httr/libxls error: Unable to open file


I'd like to read in sheet 1 of an Excel file from a website Link using the url of the Excel file. I'm on windows 10, R 3.6.1.

I'm trying to use the code from Read Excel file from a URL using the readxl package and have also checked out reading excel files into a single dataframe with readxl R.

url = 'https://dataverse.harvard.edu/file.xhtml?  persistentId=doi:10.7910/DVN/WEGWGS/I11K9Y&version=1.0'
GET(url, write_disk(tf <- tempfile(fileext = ".xls")))
df <- read_excel(tf, 1L)


# Error message
libxls error: Unable to open file

I get the following error message:

libxls error: Unable to open file

Thanks for any help!


Solution

  • If you scroll down that webpage (https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/WEGWGS), you will see that there is an explicit link for downloading the file directly (in the file meta-data box).

    If you do the following, similar to your code above, you can retrieve the file correctly:

    url = 'https://dataverse.harvard.edu/api/access/datafile/:persistentId?persistentId=doi:10.7910/DVN/WEGWGS/I11K9Y'
    
    library(tidyverse)
    library(httr)
    library(readxl)
    httr::GET(url, write_disk(tf <- tempfile(fileext = ".xlsx")))
    tf
    
    df <- read_excel(tf, 1L)