Search code examples
rpostweb-scrapinghttr

Send SQL string through POST with httr package in R


I'm trying to download a file from this website https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=289. The form on that website, generates a POST link that submits a request to their server, to create a temporary file stored here https://transtats.bts.gov/ftproot/TranStatsData/.

As for the form data, I can see the following:

UserTableName: DB1BCoupon
DBShortName: 
RawDataTable: T_DB1B_COUPON
sqlstr: +SELECT+ORIGIN_AIRPORT_ID%2CORIGIN_AIRPORT_SEQ_ID%2CORIGIN_CITY_MARKET_ID%2CDEST_AIRPORT_ID%2CDEST_AIRPORT_SEQ_ID%2CDEST_CITY_MARKET_ID+FROM++T_DB1B_COUPON+WHERE+Quarter+%3D1+AND+YEAR%3D2017
varlist: ORIGIN_AIRPORT_ID%2CORIGIN_AIRPORT_SEQ_ID%2CORIGIN_CITY_MARKET_ID%2CDEST_AIRPORT_ID%2CDEST_AIRPORT_SEQ_ID%2CDEST_CITY_MARKET_ID

Based on the above, and with the httr package I've been trying the following:

library(httr)

web <- https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=289

POST(web, body = "+SELECT+ORIGIN_AIRPORT_ID%2CORIGIN_AIRPORT_SEQ_ID%2CORIGIN_CITY_MARKET_ID%2CDEST_AIRPORT_ID%2CDEST_AIRPORT_SEQ_ID%2CDEST_CITY_MARKET_ID+FROM++T_DB1B_COUPON+WHERE+Quarter+%3D1+AND+YEAR%3D2017", encode = "form")

Now I would expect to get a response header with the following info:

Location: https://transtats.bts.gov/ftproot/TranStatsData/847324776_T_DB1B_COUPON.zip

However, for some reason I can't seem to get this. I'm sure that the code for the POST is wrong but I'm not sure where or what am I doing wrong.


Solution

  • Better late than never (for an answer)?

    That POST is super complex and the site does a redirect after processing it to another GET to grab the ZIP content.

    enter image description here

    Right-click on the POST line and choose "Copy as cURL". Do not modify the clipboard at all after that, then use curlconverter to turn it into an R function:

    library(curlconverter)
    
    straighten() %>% make_req() -> tmp # it automagically uses the clipboard contents
    

    Hit "paste" on your OS and you'll get a longer version of this:

    httr::POST(
      url = "https://www.transtats.bts.gov/DownLoad_Table.asp",
      httr::add_headers(
        Referer = "https://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=289"
      ),
      body = list(
        UserTableName = "DB1BCoupon",
        DBShortName = "", 
        RawDataTable = "T_DB1B_COUPON",
        sqlstr = " SELECT ORIGIN_AIRPORT_ID,ORIGIN_AIRPORT_SEQ_ID,ORIGIN_CITY_MARKET_ID,DEST_AIRPORT_ID,DEST_AIRPORT_SEQ_ID,DEST_CITY_MARKET_ID FROM T_DB1B_COUPON WHERE Quarter=1 AND YEAR=2018",
        varlist = "ORIGIN_AIRPORT_ID,ORIGIN_AIRPORT_SEQ_ID,ORIGIN_CITY_MARKET_ID,DEST_AIRPORT_ID,DEST_AIRPORT_SEQ_ID,DEST_CITY_MARKET_ID",
        grouplist = "", suml = "",
        sumRegion = "", filter1 = "title=",
        filter2 = "title=", geo = "All\xa0",
        time = "Q+1", timename = "Quarter",
        GEOGRAPHY = "All", XYEAR = "2018",
        FREQUENCY = "1", 
        VarDesc = "ItinID", VarType = "Num", 
        VarDesc = "MktID", VarType = "Num", 
        VarDesc = "SeqNum", VarType = "Num", 
        VarDesc = "Coupons", VarType = "Num", 
        VarDesc = "Year", VarType = "Num", VarName = "ORIGIN_AIRPORT_ID", 
        VarDesc = "OriginAirportID", VarType = "Num", VarName = "ORIGIN_AIRPORT_SEQ_ID", 
        VarDesc = "OriginAirportSeqID", VarType = "Num", VarName = "ORIGIN_CITY_MARKET_ID", 
        VarDesc = "OriginCityMarketID", VarType = "Num", 
        VarDesc = "Quarter", VarType = "Num", 
        VarDesc = "Origin", VarType = "Char", 
        VarDesc = "OriginCountry", VarType = "Char", 
        VarDesc = "OriginStateFips", VarType = "Char", 
        VarDesc = "OriginState", VarType = "Char", 
        VarDesc = "OriginStateName", VarType = "Char", 
        VarDesc = "OriginWac", VarType = "Num", VarName = "DEST_AIRPORT_ID", 
        VarDesc = "DestAirportID", VarType = "Num", VarName = "DEST_AIRPORT_SEQ_ID", 
        VarDesc = "DestAirportSeqID", VarType = "Num", VarName = "DEST_CITY_MARKET_ID", 
        VarDesc = "DestCityMarketID", VarType = "Num", 
        VarDesc = "Dest", VarType = "Char", 
        VarDesc = "DestCountry", VarType = "Char", 
        VarDesc = "DestStateFips", VarType = "Char", 
        VarDesc = "DestState", VarType = "Char", 
        VarDesc = "DestStateName", VarType = "Char", 
        VarDesc = "DestWac", VarType = "Num", 
        VarDesc = "Break", VarType = "Char", 
        VarDesc = "CouponType", VarType = "Char", 
        VarDesc = "TkCarrier", VarType = "Char", 
        VarDesc = "OpCarrier", VarType = "Char", 
        VarDesc = "RPCarrier", VarType = "Char", 
        VarDesc = "Passengers", VarType = "Num", 
        VarDesc = "FareClass", VarType = "Char", 
        VarDesc = "Distance", VarType = "Num", 
        VarDesc = "DistanceGroup", VarType = "Num", 
        VarDesc = "Gateway", VarType = "Num", 
        VarDesc = "ItinGeoType", VarType = "Num", 
        VarDesc = "CouponGeoType", VarType = "Num"
      ), 
      encode = "form",
      query = list(
        Table_ID = "289",
        Has_Group = "0", 
        Is_Zipped = "0"
      )
    ) -> res
    

    (yep, even longer than that was)

    The sqlstr parameter has the SQL query, and I'm not sure how much of that POST is "required" or not but it "worked for me"

    res is definitely big and has binary zip'd data:

    res
    ## Response [https://transtats.bts.gov/ftproot/TranStatsData/351117019_T_DB1B_COUPON.zip]
    ##   Date: 2018-10-14 02:18
    ##   Status: 200
    ##   Content-Type: application/x-zip-compressed
    ##   Size: 14.6 MB
    ## <BINARY BODY>
    

    We can save it to disk and make sure it's valid:

    (save_to <- file.path("~/Data", basename(grep("\\.zip", unlist(res$all_headers), value=TRUE))))
    ## [1] "~/Data/351117019_T_DB1B_COUPON.zip"
    
    writeBin(httr::content(res, as="raw"), save_to)
    
    unzip(save_to, list = TRUE)
    ##                          Name    Length                Date
    ## 1 351117019_T_DB1B_COUPON.csv 378311108 2018-10-13 22:18:00