Search code examples
rnested-listspurrrdiscogs-api

Munging a recursive discogs list


Using discogs, I obtain a list of releases by a given jazz musician like this:

releases <- list()
artists <- list()
artistURL <- "https://api.discogs.com/artists/"
library(jsonlite)
a <- function(artistcode){
  for(i in 0:3){
    artistset <- fromJSON(paste0(artistURL, artistcode, "/releases?page=", i))
    message("Retrieving page ", i)

    releases[[i+1]] <- (as.data.frame(artistset$releases.main_release))
      }
  return(artistset)
  message("Total rows=", dim(artistset[[2]])[1] )
}

temp<-a('265634') # art tatum 265634
temp$releases$title # shows first 50 albums...where's the rest?

Upon inspection, you will see temp is two lists, and the second is called releases. Within releases are 50 albums. However, I've asked for THREE pages of output in my fromJSON command, yet I have 22 pages of results in temp:

str(temp$pagination)  # there are 22 pages of 50 lines per page

How do I extract all the titles and other data (22 pages worth), for this artist, into a dataframe? Been messing with purrr to no avail. Thanks for any help!


Solution

  • This should work better. releases was only being defined in the scope of your function, and was not being returned to the global environment. Also changed the function to use the pages variable in the JSON to construct the loop:

    a <- function(artistcode){
      releases <- list()
      metadata <- fromJSON(paste0(artistURL, artistcode, "/releases?page=", 1))
      for (i in 1:metadata$`pagination`$pages){
        message("Retrieving page ", i)
        Sys.sleep(2) #added as I was being rate limited
        releases[[i]] <- fromJSON(paste0(artistURL, artistcode, "/releases?page=", i))$releases
      }
      return(releases)
    }
    
    temp<-a('265634') # art tatum 265634
    
    temp[[1]] # page 1
    temp[[2]] # page 2