Does anyone have any experience importing data into R from a Atom-compliant data feed? I have downloaded a ".atomsvc" file and opended it's contents in notepad and get the following:
<?xml version="1.0" encoding="utf-8" standalone="yes"?><service xmlns:atom="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns="http://www.w3.org/2007/app"><workspace><atom:title>OperationallyAvailableCapacity</atom:title><collection href="http://10.101.111.234/ReportServer?%2FInfoPost%2FOperationallyAvailableCapacity&AssetNbr=51&beg_date=05%2F03%2F2013%2000%3A00%3A00&LocationNbr=%25&LocationProp=%25&LocationName=%25&DirOfLow=%25&rs%3AParameterLanguage=&rs%3ACommand=Render&rs%3AFormat=ATOM&rc%3ADataFeed=xAx0x13"><atom:title>table1</atom:title></collection></workspace></service>
I guessing that to import this I will likely have to user RCurl but since I have limited experience with that package I was hoping someone could point me in the right direction.
Any assistance would be appreciated.
Feeds just give you the information in XML format, which can be parsed using the XML package.
library(XML)
url <- 'http://housesofstones.com/blog/feed/atom/'
# Download and parse the data
xml_data <- xmlParse(url)
# Convert the xml structure to a list so you can work with it in R
xml_list <- xmlToList(xml_data)
str(head(xml_list))
List of 6
$ title :List of 2
..$ text : chr "Houses of Stones"
..$ .attrs: Named chr "text"
.. ..- attr(*, "names")= chr "type"
$ subtitle:List of 2
..$ text : chr "\"Science is facts; just as houses are made of stones, so is science made of facts; but a pile of stones is not a house and a c"| __truncated__
..$ .attrs: Named chr "text"
.. ..- attr(*, "names")= chr "type"
$ updated : chr "2013-05-16T12:16:49Z"
$ link : Named chr [1:3] "alternate" "text/html" "http://housesofstones.com/blog"
..- attr(*, "names")= chr [1:3] "rel" "type" "href"
$ id : chr "http://housesofstones.com/blog/feed/atom/"
$ link : Named chr [1:3] "self" "application/atom+xml" "http://housesofstones.com/blog/feed/atom/"
..- attr(*, "names")= chr [1:3] "rel" "type" "href"
Or, using your example data:
example_data <- '<?xml version="1.0" encoding="utf-8" standalone="yes"?><service xmlns:atom="http://www.w3.org/2005/Atom" xmlns:app="http://www.w3.org/2007/app" xmlns="http://www.w3.org/2007/app"><workspace><atom:title>OperationallyAvailableCapacity</atom:title><collection href="http://10.101.111.234/ReportServer?%2FInfoPost%2FOperationallyAvailableCapacity&AssetNbr=51&beg_date=05%2F03%2F2013%2000%3A00%3A00&LocationNbr=%25&LocationProp=%25&LocationName=%25&DirOfLow=%25&rs%3AParameterLanguage=&rs%3ACommand=Render&rs%3AFormat=ATOM&rc%3ADataFeed=xAx0x13"><atom:title>table1</atom:title></collection></workspace></service>'
xml_data <- xmlParse(example_data)
# Convert the xml structure to a list so you can work with it in R
xml_list <- xmlToList(xml_data)
str(xml_list)
List of 1
$ workspace:List of 2
..$ title : chr "OperationallyAvailableCapacity"
..$ collection:List of 2
.. ..$ title : chr "table1"
.. ..$ .attrs: Named chr "http://10.101.111.234/ReportServer?%2FInfoPost%2FOperationallyAvailableCapacity&AssetNbr=51&beg_date=05%2F03%2F2013%2000%3A00%3"| __truncated__
.. .. ..- attr(*, "names")= chr "href"
EDIT
On closer inspection, it looks like your particular example data for some reason keeps a ton of information in a single node, encoded in a URL. If you want that data, you're going to need to pull it out.
First, call up that single node, and decode the URL so it's easier to parse:
xml_content <- URLdecode(xml_list$workspace$collection$.attrs)
You various parameters are separated by "&", so you can split the string by that character.
xml_content <- unlist(strsplit(xml_content, "&"))
Each new string contains both the parameter name and the value, separated by an equals sign. There are several ways you can pull that information apart. Perhaps the easiest way is to use the str_split_fixed
function from the plyr
package:
require(stringr)
str_split_fixed(xml_content, "=", 2)
[,1] [,2]
[1,] "http://10.101.111.234/ReportServer?/InfoPost/OperationallyAvailableCapacity" ""
[2,] "AssetNbr" "51"
[3,] "beg_date" "05/03/2013 00:00:00"
[4,] "LocationNbr" "%"
[5,] "LocationProp" "%"
[6,] "LocationName" "%"
[7,] "DirOfLow" "%"
[8,] "rs:ParameterLanguage" ""
[9,] "rs:Command" "Render"
[10,] "rs:Format" "ATOM"
[11,] "rc:DataFeed" "xAx0x13"