I'm relatively new to R programming and I'm trying to put some of the stuff I'm learning in the Johns Hopkins Data Science track to practical use. Specifically, I would like to automate the process of downloading historical bond prices from the US Treasury website
Using both Firefox and R, I was able to determine that the US Treasury website uses a very simple HTML POST form to specify a single date for the quotes of interest. It then returns a table of secondary market information for all outstanding bonds.
I have unsuccessfully tried to use two different R packages to submit a request to the US Treasury web server. Hare are the two approaches I tried:
Attempt #1 (using RCurl):
url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
td.html <- postForm(url,
submit = "Show Prices",
priceDate.year = 2014,
priceDate.month = 12,
priceDate.day = 15,
.opts = curlOptions(ssl.verifypeer = FALSE))
This results in a web page being returned and stored in td.html
but all it contains is an error message from the treasurydirect server. I know the server is working because when I submit the same request via my browser, I get the expected results.
Attempt #2 (using rvest):
s <- html_session(url)
f0 <- html_form(s)
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
test <- submit_form(s, f1)
Unfortunately, this approach doesn't even leave R and results in the following error message from R:
Submitting with 'submit'
Error in function (type, msg, asError = TRUE) : <url> malformed
I can't seem to figure out how to see what "malformed" text is being sent to rvest so that I can try to diagnose the problem.
Any suggestions or tips to solving this seeming simple task would be greatly appreciated!
Well, it appears to work with the httr
library.
library(httr)
url <- "https://www.treasurydirect.gov/GA-FI/FedInvest/selectSecurityPriceDate.htm"
fd <- list(
submit = "Show Prices",
priceDate.year = 2014,
priceDate.month = 12,
priceDate.day = 15
)
resp<-POST(url, body=fd, encode="form")
content(resp)
The rvest
library is really just a wrapper to httr
. It looks like it doesn't do a good job of interpreting absolute URLs without the server name. So if you look at
f1$url
# [1] /GA-FI/FedInvest/selectSecurityPriceDate.htm
you see that it just has the path and not the server name. This appears to be confusing httr
. If you do
f1 <- set_values(f0[[2]], priceDate.year=2014, priceDate.month=12, priceDate.day=15)
f1$url <- url
test <- submit_form(s, f1)
that seems to work. Perhaps it's a bug that should be reported to rvest
. (Tested on rvest_0.1.0
)