Search code examples
rxmlweb-scrapingrcurlrvest

Submitting an XML request form to an API using R


I was previously (and still) having trouble navigating from a beginning link, in order to get to the next link to fill in a data request form and then extract the information from the resulting table. My coding attempt is here.

I have been informed that CDC Wonder does have an API in order to submit an XML request form within R. The link to all the details on how to submit a form is here.

However, I have no idea how to even send an XML request form using R and have tried searching around for solutions. If anyone can get me started based on the directions of the API, and one of the XML form examples that they have listed, then I think I can figure out the rest.

The code below shows my attempt using the first example XML request form:

request_xml <-  
"<?xml version="1.0" encoding="UTF-8"?>
 <request-parameters>
 <parameter>
  <name>accept_datause_restrictions</name>
  <value>true</value>
 </parameter>
 <parameter>
  <name>B_1</name>
  <value>D76.V1-level1</value>
 </parameter>
 <parameter>
  <name>B_2</name>
  <value>D76.V8</value>
 </parameter>
 <parameter>
  <name>B_3</name>
  <value>*None*</value>
 </parameter>
 <parameter>
  <name>B_4</name>
  <value>*None*</value>
 </parameter>
 <parameter>
  <name>B_5</name>
  <value>*None*</value>
 </parameter>
 <parameter>
  <name>F_D76.V1</name>
  <value>2009</value>
  <value>2010</value>
  <value>2011</value>
  <value>2012</value>
  <value>2013</value>
 </parameter>
 <parameter>
  <name>F_D76.V10</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>F_D76.V2</name>
  <value>C00-D48</value>
 </parameter>
 <parameter>
  <name>F_D76.V27</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>F_D76.V9</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>I_D76.V1</name>
  <value>
   2009 (2009) 2010 (2010) 2011 (2011) 2012 (2012) 2013 (2013)
  </value>
 </parameter>
 <parameter>
  <name>I_D76.V10</name>
  <value>*All* (The United States)</value>
 </parameter>
 <parameter>
  <name>I_D76.V2</name>
  <value>C00-D48 (Neoplasms)</value>
 </parameter>
 <parameter>
  <name>I_D76.V27</name>
  <value>*All* (The United States)</value>
 </parameter>
 <parameter>
  <name>I_D76.V9</name>
  <value>*All* (The United States)</value>
 </parameter>
 <parameter>
  <name>M_1</name>
  <value>D76.M1</value>
 </parameter>
 <parameter>
  <name>M_2</name>
  <value>D76.M2</value>
 </parameter>
 <parameter>
  <name>M_3</name>
  <value>D76.M3</value>
 </parameter>
 <parameter>
  <name>M_41</name>
  <value>D76.M41</value>
 </parameter>
 <parameter>
  <name>M_42</name>
  <value>D76.M42</value>
 </parameter>
 <parameter>
  <name>O_V10_fmode</name>
  <value>freg</value>
 </parameter>
 <parameter>
  <name>O_V1_fmode</name>
  <value>freg</value>
 </parameter>
 <parameter>
  <name>O_V27_fmode</name>
  <value>freg</value>
 </parameter>
 <parameter>
  <name>O_V2_fmode</name>
  <value>freg</value>
 </parameter>
 <parameter>
  <name>O_V9_fmode</name>
  <value>freg</value>
 </parameter>
 <parameter>
  <name>O_aar</name>
  <value>aar_std</value>
 </parameter>
 <parameter>
  <name>O_aar_pop</name>
  <value>0000</value>
 </parameter>
 <parameter>
  <name>O_age</name>
  <value>D76.V5</value>
 </parameter>
 <parameter>
  <name>O_javascript</name>
  <value>on</value>
 </parameter>
 <parameter>
  <name>O_location</name>
  <value>D76.V9</value>
 </parameter>
 <parameter>
  <name>O_precision</name>
  <value>1</value>
 </parameter>
 <parameter>
  <name>O_rate_per</name>
  <value>100000</value>
 </parameter>
 <parameter>
  <name>O_show_totals</name>
  <value>true</value>
 </parameter>
 <parameter>
  <name>O_timeout</name>
  <value>300</value>
 </parameter>
 <parameter>
  <name>O_title</name>
  <value>Example1</value>
 </parameter>
 <parameter>
  <name>O_ucd</name>
  <value>D76.V2</value>
 </parameter>
 <parameter>
  <name>O_urban</name>
  <value>D76.V19</value>
 </parameter>
 <parameter>
  <name>VM_D76.M6_D76.V10</name>
  <value/>
 </parameter>
 <parameter>
  <name>VM_D76.M6_D76.V17</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>VM_D76.M6_D76.V1_S</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>VM_D76.M6_D76.V7</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>VM_D76.M6_D76.V8</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V1</name>
  <value/>
 </parameter>
 <parameter>
  <name>V_D76.V10</name>
  <value/>
 </parameter>
 <parameter>
  <name>V_D76.V11</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V12</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V17</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V19</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V2</name>
  <value/>
 </parameter>
 <parameter>
  <name>V_D76.V20</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V21</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V22</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V23</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V24</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V25</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V27</name>
  <value/>
 </parameter>
 <parameter>
  <name>V_D76.V4</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V5</name>
  <value>1</value>
  <value>1-4</value>
  <value>5-14</value>
  <value>15-24</value>
  <value>25-34</value>
  <value>35-44</value>
  <value>45-54</value>
  <value>55-64</value>
  <value>65-74</value>
  <value>75-84</value>
  <value>85+</value>
 </parameter>
 <parameter>
  <name>V_D76.V51</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V52</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V6</name>
  <value>00</value>
 </parameter>
 <parameter>
  <name>V_D76.V7</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V8</name>
  <value>*All*</value>
 </parameter>
 <parameter>
  <name>V_D76.V9</name>
  <value/>
 </parameter>
 <parameter>
  <name>action-Send</name>
  <value>Send</value>
 </parameter>
 <parameter>
  <name>finder-stage-D76.V1</name>
  <value>codeset</value>
 </parameter>
 <parameter>
  <name>finder-stage-D76.V10</name>
  <value>codeset</value>
 </parameter>
 <parameter>
  <name>finder-stage-D76.V2</name>
  <value>codeset</value>
 </parameter>
 <parameter>
  <name>finder-stage-D76.V27</name>
  <value>codeset</value>
 </parameter>
 <parameter>
  <name>finder-stage-D76.V9</name>
  <value>codeset</value>
 </parameter>
 <parameter>
  <name>stage</name>
  <value>request</value>
 </parameter>
</request-parameters>"  

library(RCurl)
url <- "http://wonder.cdc.gov/controller/datarequest/D76"
data <- getURL(
 url = url,
 postfields = request_xml,
 verbose = TRUE
)

Thanks for your time!

ACE


Solution

  • Despite some folks heralding the CDC WONDER API as awesome, it seems like a pretty cruddy API interface to me. But, I can see why they might be hesitant to expose a SQL interface (which is what this really needs) due to the plethora of attack methods against SQL.

    Actually having to craft XML by hand to make those queries is just horribad. So, you should use wondr, an R package that works with the API.

    It's still ugly (here's one of their example queries):

    library(wondr) ## devtools::install_github("hrbrmstr/wondr")
    
    wondr() %>%
      add_param("B_1", "D76.V22") %>% 
      add_param("B_2", "D76.V23") %>% 
      add_param("B_3", "*None*") %>% 
      add_param("B_4", "*None*") %>% 
      add_param("B_5", "*None*") %>% 
      add_param("F_D76.V1", "*All*") %>% 
      add_param("F_D76.V10", "*All*") %>% 
      add_param("F_D76.V2", "*All*") %>% 
      add_param("F_D76.V27", "*All*") %>% 
      add_param("F_D76.V9", "*All*") %>% 
      add_param("I_D76.V1", "*All* (All Dates)") %>% 
      add_param("I_D76.V10", "*All* (The United States)") %>% 
      add_param("I_D76.V2", "*All* (All Causes of Death)") %>% 
      add_param("I_D76.V27", "*All* (The United States)") %>% 
      add_param("I_D76.V9", "*All* (The United States)") %>% 
      add_param("M_1", "D76.M1") %>% 
      add_param("M_2", "D76.M2") %>% 
      add_param("M_3", "D76.M3") %>% 
      add_param("O_V10_fmode", "freg") %>% 
      add_param("O_V1_fmode", "freg") %>% 
      add_param("O_V27_fmode", "freg") %>% 
      add_param("O_V2_fmode", "freg") %>% 
      add_param("O_V9_fmode", "freg") %>% 
      add_param("O_aar", "aar_none") %>% 
      add_param("O_aar_pop", "0000") %>% 
      add_param("O_age", "D76.V52") %>% 
      add_param("O_javascript", "on") %>% 
      add_param("O_location", "D76.V9") %>% 
      add_param("O_precision", "1") %>% 
      add_param("O_rate_per", "100000") %>% 
      add_param("O_show_totals", "true") %>% 
      add_param("O_timeout", "300") %>% 
      add_param("O_title", "Example2") %>% 
      add_param("O_ucd", "D76.V22") %>% 
      add_param("O_urban", "D76.V19") %>% 
      add_param("VM_D76.M6_D76.V10", "") %>% 
      add_param("VM_D76.M6_D76.V17", "*All*") %>% 
      add_param("VM_D76.M6_D76.V1_S", "*All*") %>% 
      add_param("VM_D76.M6_D76.V7", "*All*") %>% 
      add_param("VM_D76.M6_D76.V8", "*All*") %>% 
      add_param("V_D76.V1", "") %>% 
      add_param("V_D76.V10", "") %>% 
      add_param("V_D76.V11", "*All*") %>% 
      add_param("V_D76.V12", "*All*") %>% 
      add_param("V_D76.V17", "*All*") %>% 
      add_param("V_D76.V19", "*All*") %>% 
      add_param("V_D76.V2", "") %>% 
      add_param("V_D76.V20", "*All*") %>% 
      add_param("V_D76.V21", "*All*") %>% 
      add_param("V_D76.V22", "1", "2", "3", "4", "5") %>% 
      add_param("V_D76.V23", "*All*") %>% 
      add_param("V_D76.V24", "*All*") %>% 
      add_param("V_D76.V25", "*All*") %>% 
      add_param("V_D76.V27", "") %>% 
      add_param("V_D76.V4", "*All*") %>% 
      add_param("V_D76.V5", "*All*") %>% 
      add_param("V_D76.V51", "*All*") %>% 
      add_param("V_D76.V52", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", "17", "18") %>% 
      add_param("V_D76.V6", "00") %>% 
      add_param("V_D76.V7", "*All*") %>% 
      add_param("V_D76.V8", "*All*") %>% 
      add_param("V_D76.V9", "") %>% 
      add_param("action-Send", "Send") %>% 
      add_param("finder-stage-D76.V1", "codeset") %>% 
      add_param("finder-stage-D76.V10", "codeset") %>% 
      add_param("finder-stage-D76.V2", "codeset") %>% 
      add_param("finder-stage-D76.V27", "codeset") %>% 
      add_param("finder-stage-D76.V9", "codeset") %>% 
      add_param("stage", "request") %>%  
      make_query("D76") -> query_result
    

    You need to parse the result of that:

    library(xml2)
    library(purrr)
    
    xml_find_all(query_result, ".//response/data-table/r") %>% 
      map_df(function(row) {
        xml_find_all(row, ".//c") %>% 
          xml_attrs() %>% 
          as.list() %>% 
          setNames(sprintf("V%d", 1:length(.))) %>% 
          as.data.frame(stringsAsFactors=FALSE)
      }) -> df
    
    print(df)
    ##                                        V1                                               V2            V3            V4         V5
    ## 1                           Unintentional                                       Cut/Pierce            97 1,243,249,173        0.0
    ## 2                           Unintentional                                         Drowning        15,945 1,243,249,173        1.3
    ## 3                           Unintentional                                             Fall         2,213 1,243,249,173        0.2
    ## 4                           Unintentional                                       Fire/Flame         7,301 1,243,249,173        0.6
    ## 5                           Unintentional                             Hot object/Substance            96 1,243,249,173        0.0
    ## 6                           Unintentional                                          Firearm         2,042 1,243,249,173        0.2
    ## 7                           Unintentional                                        Machinery           484 1,243,249,173        0.0
    ## 8                           Unintentional                            Motor Vehicle Traffic        74,997 1,243,249,173        6.0
    ## 9                           Unintentional                              Other Pedal cyclist           458 1,243,249,173        0.0
    ## 10                          Unintentional                                 Other Pedestrian         3,221 1,243,249,173        0.3
    ## 11                          Unintentional                             Other land transport         3,449 1,243,249,173        0.3
    ## 12                          Unintentional                                  Other transport         1,344 1,243,249,173        0.1
    ## 13                          Unintentional                            Natural/Environmental         1,681 1,243,249,173        0.1
    ## 14                          Unintentional                                     Overexertion             2 1,243,249,173 Unreliable
    ## 15                          Unintentional                                        Poisoning         7,326 1,243,249,173        0.6
    ## 16                          Unintentional                             Struck by or against         1,378 1,243,249,173        0.1
    ## 17                          Unintentional                                      Suffocation        17,356 1,243,249,173        1.4
    ## 18                          Unintentional             Other specified, classifiable Injury         1,199 1,243,249,173        0.1
    ## 19                          Unintentional Other specified, not elsewhere classified Injury           518 1,243,249,173        0.0
    ## 20                          Unintentional                               Unspecified Injury         1,540 1,243,249,173        0.1
    ## 21                          Unintentional                                                1       142,647 1,243,249,173       11.5
    ## 22                                Suicide                                       Cut/Pierce            64 1,243,249,173        0.0
    ## 23                                Suicide                                         Drowning           111 1,243,249,173        0.0
    ## 24                                Suicide                                             Fall           412 1,243,249,173        0.0
    ## 25                                Suicide                                       Fire/Flame            50 1,243,249,173        0.0
    ## 26                                Suicide                                          Firearm         9,956 1,243,249,173        0.8
    ## 27                                Suicide                             Other land transport           139 1,243,249,173        0.0
    ## 28                                Suicide                                        Poisoning         1,336 1,243,249,173        0.1
    ## 29                                Suicide                                      Suffocation        10,559 1,243,249,173        0.8
    ## 30                                Suicide             Other specified, classifiable Injury           379 1,243,249,173        0.0
    ## 31                                Suicide Other specified, not elsewhere classified Injury           103 1,243,249,173        0.0
    ## 32                                Suicide                               Unspecified Injury            76 1,243,249,173        0.0
    ## 33                                Suicide                                                1        23,185 1,243,249,173        1.9
    ## 34                               Homicide                                       Cut/Pierce         2,375 1,243,249,173        0.2
    ## 35                               Homicide                                         Drowning           459 1,243,249,173        0.0
    ## 36                               Homicide                                             Fall            33 1,243,249,173        0.0
    ## 37                               Homicide                                       Fire/Flame           561 1,243,249,173        0.0
    ## 38                               Homicide                             Hot object/Substance            47 1,243,249,173        0.0
    ## 39                               Homicide                                          Firearm        20,897 1,243,249,173        1.7
    ## 40                               Homicide                             Other land transport           131 1,243,249,173        0.0
    ## 41                               Homicide                                  Other transport             8 1,243,249,173 Unreliable
    ## 42                               Homicide                                        Poisoning           494 1,243,249,173        0.0
    ## 43                               Homicide                             Struck by or against           338 1,243,249,173        0.0
    ## 44                               Homicide                                      Suffocation         1,582 1,243,249,173        0.1
    ## 45                               Homicide             Other specified, classifiable Injury         2,912 1,243,249,173        0.2
    ## 46                               Homicide Other specified, not elsewhere classified Injury         1,137 1,243,249,173        0.1
    ## 47                               Homicide                               Unspecified Injury         5,821 1,243,249,173        0.5
    ## 48                               Homicide                                                1        36,795 1,243,249,173        3.0
    ## 49                          Undetermined                                        Cut/Pierce            10 1,243,249,173 Unreliable
    ## 50                          Undetermined                                          Drowning           377 1,243,249,173        0.0
    ## 51                          Undetermined                                              Fall            96 1,243,249,173        0.0
    ## 52                          Undetermined                                        Fire/Flame           256 1,243,249,173        0.0
    ## 53                          Undetermined                              Hot object/Substance             4 1,243,249,173 Unreliable
    ## 54                          Undetermined                                           Firearm           501 1,243,249,173        0.0
    ## 55                          Undetermined                              Other land transport            33 1,243,249,173        0.0
    ## 56                          Undetermined                                         Poisoning         1,217 1,243,249,173        0.1
    ## 57                          Undetermined                              Struck by or against             2 1,243,249,173 Unreliable
    ## 58                          Undetermined                                       Suffocation         1,234 1,243,249,173        0.1
    ## 59                          Undetermined              Other specified, classifiable Injury            23 1,243,249,173        0.0
    ## 60                          Undetermined  Other specified, not elsewhere classified Injury           267 1,243,249,173        0.0
    ## 61                          Undetermined                                Unspecified Injury           743 1,243,249,173        0.1
    ## 62                          Undetermined                                                 1         4,763 1,243,249,173        0.4
    ## 63 Legal Intervention / Operations of War                                          Firearm           251 1,243,249,173        0.0
    ## 64 Legal Intervention / Operations of War             Other specified, classifiable Injury             1 1,243,249,173 Unreliable
    ## 65 Legal Intervention / Operations of War Other specified, not elsewhere classified Injury            10 1,243,249,173 Unreliable
    ## 66 Legal Intervention / Operations of War                               Unspecified Injury             2 1,243,249,173 Unreliable
    ## 67 Legal Intervention / Operations of War                                                1           264 1,243,249,173        0.0
    ## 68                                      2                                          207,654 1,243,249,173          16.7       <NA>
    

    And, you still need to clean that up, but you've at least got a way to craft, execute and parse queries, now.

    It might be worth petitioning the rOpenSci folks to adopt this pkg (I don't recall seeing it in their vast library). I'll hit them up on it in a few weeks, too.