Search code examples
jsonrazurehttr

Extracting data from an API using R


I have access to some telemetry data in azure (specifically all the API calls for customers using a mobile app). I have used the httr package in R to request the data over a 3 minute period and assess the response like so (obviously have my own app ID and key which I have not included below):

install.packages("httr")
library(httr)

r1 <- GET("https://api.applicationinsights.io/v1/apps/application-ID/query?timespan=PT0.05H&query=requests", add_headers("X-Api-Key" = "my-unique-key"))

r1

####### response object ########
# Response [https://api.applicationinsights.io/v1/apps/application-ID/query?timespan=PT0.05H&query=requests]
# Date: 2018-01-11 15:55
# Status: 200
# Content-Type: application/json; charset=utf-8
# Size: 84.7 kB

In the environment window, I can see r1 is a list of 10 and that there are 84,652 raw values:

enter image description here

I can also use the content function to see I have a list of 1:

r2 <- content(r1)

enter image description here

I have two questions really:

1) How do I make sense of these outputs in the environment window? What do they tell me about the structure of my data (I think it's JSON based on content type description)

2) Is there a way to retrieve the data and get it into tabular format (a data frame)? I don't understand how to query the data. I read this article, but couldn't apply it to my data: https://tclavelle.github.io/blog/r_and_apis/

Any help would be appreciated.


UPDATE 19/01/18

I used jalind's suggestion. See below for code and outputs:

library(httr)
library(jsonlite)

r1 <- GET("https://api.applicationinsights.io/v1/apps/application-ID/query?timespan=PT0.05H&query=requests", add_headers("X-Api-Key" = "my-unique-key"))

#convert to a character string
r2 <- rawtoChar(r1$content)  

#check the class is character
class(r2)    

# now extract JSON from string object
r3 <- fromJSON(r2)

# convert to a data frame - this returns a data frame with columns called name, columns and rows 
x <- as.data.frame(r3[[1]])  

# column headings data frame (there are 37 columns - see example of first 3 columns below):               
c <- as.data.frame(x$columns)

#                       name      type
#                  timestamp    datetime
#                         id     string
#                     source     string

# data frame with 37 columns and all rows of telemetry data (only showing first 4 columns of this data frame):

r <- as.data.frame(x$rows)
#           X1                               X2                X3                  X4
# 1   2018-01-19T10:29:25.4Z       |aticCNxxxx=.f83assss_     <NA>          GET /Cards/Cardtype1
# 2   2018-01-19T10:29:30.226Z     |tX6Xz0xxxxx=.27cxcxae_    <NA>          GET /AddressLookup/Address
# 3   2018-01-19T10:29:45.327Z     |OgfPbicLues=.f83a9a1f_    <NA>          POST /Account/MobileDevice
# 4   2018-01-19T10:29:46.078Z     |V5MwpXXxxxxx=.f83axxxx_   <NA>          GET /Cards/Cardtype1
# 5   2018-01-19T10:30:00.427Z     |Jok8wxxxxxx=.7be33aaa_    <NA>          GET /cards/Cardtype1

Solution

  • p0bs is right - you should check out the jsonlite package.

    I don't know if I can entirely help you with the first part of your question, but I might be able to help you get JSON into a data frame.

    When you apply the GET function to a URL the raw content that is returned is in hex.

    raw.result <- GET(url = url, path = path)
    head(raw.result$content) ## This is in hex
    

    Hex is difficult to work with, so one thing you might do is convert the content into a string. You can do this with the rawToChar function.

    text.raw.content <- rawToChar(raw.result$content)
    class(text.raw.content) ## Now its a string
    nchar(text.raw.content) ## How many chars?
    

    OK - so now you have a string... which is better than hex... but still not what you are looking for. But you can use the fromJSON function in the jsonlite package to extract the JSON from the string object.

    json.content <- fromJSON(text.raw.content)
    class(json.content) ## It's a list
    length(json.content) ## With two elements
    names(json.content) ## meta and data... makes sense...
    class(json.content[[2]]) ## data.frame
    

    So, basically, the second element of this list is the JSON content converted to a native R data frame. In my experience there is quite a bit more munging that must take place after you get this far... but hopefully this gets you started.