I'm trying to download a file from this website https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=289. The form on that website, generates a POST link that submits a request to their server, to create a temporary file stored here https://transtats.bts.gov/ftproot/TranStatsData/.
As for the form data, I can see the following:
UserTableName: DB1BCoupon
DBShortName:
RawDataTable: T_DB1B_COUPON
sqlstr: +SELECT+ORIGIN_AIRPORT_ID%2CORIGIN_AIRPORT_SEQ_ID%2CORIGIN_CITY_MARKET_ID%2CDEST_AIRPORT_ID%2CDEST_AIRPORT_SEQ_ID%2CDEST_CITY_MARKET_ID+FROM++T_DB1B_COUPON+WHERE+Quarter+%3D1+AND+YEAR%3D2017
varlist: ORIGIN_AIRPORT_ID%2CORIGIN_AIRPORT_SEQ_ID%2CORIGIN_CITY_MARKET_ID%2CDEST_AIRPORT_ID%2CDEST_AIRPORT_SEQ_ID%2CDEST_CITY_MARKET_ID
Based on the above, and with the httr package I've been trying the following:
library(httr)
web <- https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=289
POST(web, body = "+SELECT+ORIGIN_AIRPORT_ID%2CORIGIN_AIRPORT_SEQ_ID%2CORIGIN_CITY_MARKET_ID%2CDEST_AIRPORT_ID%2CDEST_AIRPORT_SEQ_ID%2CDEST_CITY_MARKET_ID+FROM++T_DB1B_COUPON+WHERE+Quarter+%3D1+AND+YEAR%3D2017", encode = "form")
Now I would expect to get a response header with the following info:
Location: https://transtats.bts.gov/ftproot/TranStatsData/847324776_T_DB1B_COUPON.zip
However, for some reason I can't seem to get this. I'm sure that the code for the POST is wrong but I'm not sure where or what am I doing wrong.
Better late than never (for an answer)?
That POST
is super complex and the site does a redirect after processing it to another GET
to grab the ZIP content.
Right-click on the POST
line and choose "Copy as cURL". Do not modify the clipboard at all after that, then use curlconverter
to turn it into an R function:
library(curlconverter)
straighten() %>% make_req() -> tmp # it automagically uses the clipboard contents
Hit "paste" on your OS and you'll get a longer version of this:
httr::POST(
url = "https://www.transtats.bts.gov/DownLoad_Table.asp",
httr::add_headers(
Referer = "https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=289"
),
body = list(
UserTableName = "DB1BCoupon",
DBShortName = "",
RawDataTable = "T_DB1B_COUPON",
sqlstr = " SELECT ORIGIN_AIRPORT_ID,ORIGIN_AIRPORT_SEQ_ID,ORIGIN_CITY_MARKET_ID,DEST_AIRPORT_ID,DEST_AIRPORT_SEQ_ID,DEST_CITY_MARKET_ID FROM T_DB1B_COUPON WHERE Quarter=1 AND YEAR=2018",
varlist = "ORIGIN_AIRPORT_ID,ORIGIN_AIRPORT_SEQ_ID,ORIGIN_CITY_MARKET_ID,DEST_AIRPORT_ID,DEST_AIRPORT_SEQ_ID,DEST_CITY_MARKET_ID",
grouplist = "", suml = "",
sumRegion = "", filter1 = "title=",
filter2 = "title=", geo = "All\xa0",
time = "Q+1", timename = "Quarter",
GEOGRAPHY = "All", XYEAR = "2018",
FREQUENCY = "1",
VarDesc = "ItinID", VarType = "Num",
VarDesc = "MktID", VarType = "Num",
VarDesc = "SeqNum", VarType = "Num",
VarDesc = "Coupons", VarType = "Num",
VarDesc = "Year", VarType = "Num", VarName = "ORIGIN_AIRPORT_ID",
VarDesc = "OriginAirportID", VarType = "Num", VarName = "ORIGIN_AIRPORT_SEQ_ID",
VarDesc = "OriginAirportSeqID", VarType = "Num", VarName = "ORIGIN_CITY_MARKET_ID",
VarDesc = "OriginCityMarketID", VarType = "Num",
VarDesc = "Quarter", VarType = "Num",
VarDesc = "Origin", VarType = "Char",
VarDesc = "OriginCountry", VarType = "Char",
VarDesc = "OriginStateFips", VarType = "Char",
VarDesc = "OriginState", VarType = "Char",
VarDesc = "OriginStateName", VarType = "Char",
VarDesc = "OriginWac", VarType = "Num", VarName = "DEST_AIRPORT_ID",
VarDesc = "DestAirportID", VarType = "Num", VarName = "DEST_AIRPORT_SEQ_ID",
VarDesc = "DestAirportSeqID", VarType = "Num", VarName = "DEST_CITY_MARKET_ID",
VarDesc = "DestCityMarketID", VarType = "Num",
VarDesc = "Dest", VarType = "Char",
VarDesc = "DestCountry", VarType = "Char",
VarDesc = "DestStateFips", VarType = "Char",
VarDesc = "DestState", VarType = "Char",
VarDesc = "DestStateName", VarType = "Char",
VarDesc = "DestWac", VarType = "Num",
VarDesc = "Break", VarType = "Char",
VarDesc = "CouponType", VarType = "Char",
VarDesc = "TkCarrier", VarType = "Char",
VarDesc = "OpCarrier", VarType = "Char",
VarDesc = "RPCarrier", VarType = "Char",
VarDesc = "Passengers", VarType = "Num",
VarDesc = "FareClass", VarType = "Char",
VarDesc = "Distance", VarType = "Num",
VarDesc = "DistanceGroup", VarType = "Num",
VarDesc = "Gateway", VarType = "Num",
VarDesc = "ItinGeoType", VarType = "Num",
VarDesc = "CouponGeoType", VarType = "Num"
),
encode = "form",
query = list(
Table_ID = "289",
Has_Group = "0",
Is_Zipped = "0"
)
) -> res
(yep, even longer than that was)
The sqlstr
parameter has the SQL query, and I'm not sure how much of that POST
is "required" or not but it "worked for me"
res
is definitely big and has binary zip'd data:
res
## Response [https://transtats.bts.gov/ftproot/TranStatsData/351117019_T_DB1B_COUPON.zip]
## Date: 2018-10-14 02:18
## Status: 200
## Content-Type: application/x-zip-compressed
## Size: 14.6 MB
## <BINARY BODY>
We can save it to disk and make sure it's valid:
(save_to <- file.path("~/Data", basename(grep("\\.zip", unlist(res$all_headers), value=TRUE))))
## [1] "~/Data/351117019_T_DB1B_COUPON.zip"
writeBin(httr::content(res, as="raw"), save_to)
unzip(save_to, list = TRUE)
## Name Length Date
## 1 351117019_T_DB1B_COUPON.csv 378311108 2018-10-13 22:18:00