Search code examples
rxmlxml2

How to convert a xml file to a R data frame


I would like to convert the following xml file to a R data frame (https://www.dropbox.com/s/wwejpr0nfcj6gnw/20130919_20130920_SLD_FCST_ACTUAL_20210507_15_57_58_v1.xml?dl=1)

I have tried to use the XML package and xml2 package but had no luck. I am new to xml and would appreciate any help.

In the XML package I have tried xmlParse and xmlToDataFrame. In the xml2 package I have tried read_xml and xml_find_all functions and none of them worked.

Update: The solution provided by Onyambu below works for me but if other community members have other ways of solving this problem, please feel free to provide other solutions.


Solution

  • One way is to do:

    a <- 'https://www.dropbox.com/s/wwejpr0nfcj6gnw/20130919_20130920_SLD_FCST_ACTUAL_20210507_15_57_58_v1.xml?dl=1'
    
    library(purrr)
    library(rvest)
    
    h <- c("data_item", "resource_name", "opr_date", "interval_num", 
           "interval_start_gmt", "interval_end_gmt", "value")
    
    read_html(a) %>%
      html_nodes("rto report_item")%>%
      map_dfr(~map_dfc(setNames(h, h), 
                   ~html_text(html_nodes(.y, .x)), .y = .x), .id = "grp")
    
     A tibble: 120 x 8
       grp   data_item  resource_name opr_date interval_num interval_start_~ interval_end_gmt value
       <chr> <chr>      <chr>         <chr>    <chr>        <chr>            <chr>            <chr>
     1 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 7            2013-09-19T13:0~ 2013-09-19T14:0~ 20425
     2 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 24           2013-09-20T06:0~ 2013-09-20T07:0~ 20424
     3 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 12           2013-09-19T18:0~ 2013-09-19T19:0~ 20425
     4 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 14           2013-09-19T20:0~ 2013-09-19T21:0~ 20425
     5 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 18           2013-09-20T00:0~ 2013-09-20T01:0~ 20458
     6 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 19           2013-09-20T01:0~ 2013-09-20T02:0~ 20454
     7 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 21           2013-09-20T03:0~ 2013-09-20T04:0~ 20443
     8 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 23           2013-09-20T05:0~ 2013-09-20T06:0~ 20430
     9 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 4            2013-09-19T10:0~ 2013-09-19T11:0~ 20425
    10 1     SYS_FCST_~ CA ISO-TAC    2013-09~ 5            2013-09-19T11:0~ 2013-09-19T12:0~ 20425
    # ... with 110 more rows