Search code examples
rrcurlgoogle-books

In R, convert character object to list or dataframe using pattern to extract colnames and values


I have a list of book titles and authors, and I am using the Google books API to access additional information about the books (e.g. complete title, ISBNs, etc.) Ultimately, I want to copy the information from Google into my original list only if the author names field of the first item returned by Google includes the name in my author name in my original list.

My question is about whether there is a simple way to convert the result of the query (which is a character object) into a table or dataframe based on patterns in the google result. Below is an example.

library(RCurl)
result<-getURL("https://www.googleapis.com/books/v1/volumes?q=fellowship%20of%20the%20ring%20tolkien&startIndex=0",ssl.verifyhost=F,ssl.verifypeer=F,followlocation=T)

print(result)

This leads to this result:

[1] "{\n \"kind\": \"books#volumes\",\n \"totalItems\": 717,\n \"items\": [\n {\n \"kind\": \"books#volume\",\n \"id\": \"aWZzLPhY4o0C\",\n \"etag\": \"UKfRIR+5nhY\",\n \"selfLink\": \"https://www.googleapis.com/books/v1/volumes/aWZzLPhY4o0C\",\n \"volumeInfo\": {\n \"title\": \"The Fellowship of the Ring\",\n \"subtitle\": \"Being the First Part of The Lord of the Rings\",\n \"authors\": [\n \"J.R.R. Tolkien\"\n ],\n \"publisher\": \"Houghton Mifflin Harcourt\",\n \"publishedDate\": \"2012-02-15\",\n \"description\": \"The first volume in J.R.R. Tolkien's epic adventure THE LORD OF THE RINGS One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them In ancient times the Rings of Power were crafted by the Elven-smiths, and Sauron, the Dark Lord, forged the One Ring, filling it with his own power so that he could rule all others. But the One Ring was taken from him, and though he sought it throughout Mid...

I would like to convert the resulting character object to a list or table or dataframe, and for the most part,

column names enclosed in " ", preceded on the left by a line return \n, and followed by ":" on the right row values enclosed in " ", preceded on the left by ": ", and follwed ",\n" on the right

But some fields, like the ISBNs, don't follow the pattern exactly.

for example, I'd like result.df to look like:

kind    title   subtitle    authors publisher   publishedDate description   ISBN_13 ISBN_10
"books#volume"  "The Fellowship of the Ring" "Being the First Part of The Lord of the Rings" 
 "J.R.R. Tolkien" "Houghton Mifflin Harcourt" "2012-02-15" "The first volume in J.R.R. Tolkien's epic adventure THE LORD OF THE RINGS One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them In ancient times the Rings of Power were crafted by the Elven-smiths, and Sauron, the Dark Lord, forged the One Ring, filling it with his own power so that he could rule all others. But the One Ring was taken from him, and though he sought it throughout Middle-earth, it remained lost to him. After many ages it fell into the hands of Bilbo Baggins, as told in The Hobbit. In a sleepy village in the Shire, young Frodo Baggins finds himself faced with an immense task, as his elderly cousin Bilbo entrusts the Ring to his care. Frodo must leave his home and make a perilous journey across Middle-earth to the Cracks of Doom, there to destroy the Ring and foil the Dark Lord in his evil purpose. “A unique, wholly realized other world, evoked from deep in the well of Time, massively detailed, absorbingly entertaining, profound in meaning.” – New York Times" "9780547952017" "0547952015"

Ultimately, I want to be able to copy values from the new list/table/dataframe to another dataframe, if certain values match (e.g., the authors value includes a match to a value in another dataframe), similar to this excerpt from a loop:

if(grepl(books$auth1last[i],result.df$authors[1])==TRUE){
    books$isbn13[i] = result.df$isbn13[1] 
}else{
    books$isbn13[i] = NA} 

Is there an elegant way to convert the character object into something more like an organized list/table/df with just a few lines, or will I have to extract each column name and value with a separate line using something like rm_between? Thanks!


Solution

  • You can convert the returned string of json into a list using the jsonlite package. You just need to remove the line breaks for it to work.

    example:

    library(RCurl)
    result <- getURL("https://www.googleapis.com/books/v1/volumes?q=fellowship%20of%20the%20ring%20tolkien&startIndex=0",ssl.verifyhost=F,ssl.verifypeer=F,followlocation=T)
    
    result_no_breaks <- gsub("\\n", " ",result)
    json_list <- jsonlite::fromJSON(result_no_breaks)