I am trying to download an excel file, which I have the link to, but I am required to log in to the page before I can download the file. I have successfully passed the login page with rvest, rcurl and httr, but I am having an extremely difficult time downloading the file after I have logged in.
url <- "https://website.com/console/login.do"
download_url <- "https://website.com/file.xls"
session <- html_session(url)
form <- html_form(session)[[1]]
filled_form <- set_values(form,
userid = user,
password = pass)
## Save main page url
main_page <- submit_form(session, filled_form)
download.file(download_url, "./file.xls", method = "curl")
When I run the download.file command, the file pops up in my working directory, but it is not the file I am trying to download, and is actually just a corrupted .XLS file with no data.
For reference, if I log in to the website via chrome, and paste the download link into the browser window after I have logged in, the file automatically starts downloading. If I do the same in IE, the file download dialog box pops up and asks me if I want to save the file.
Possibly relevant info:
Thanks in advance for your time!
Someone on /r/rstats actually found the answer for this question. The solution for my problem was as follows:
#after login and submit_form do this:
download <- jump_to(main_page, download_url)
# write file to current working directory
writeBin(download$response$content, basename(download_url))