I'm learning R Studio and this is for a project I'm working on. I'm using the Marvel API to get a list of all of the characters in the universe by using a for loop to call the API multiple times. They API limits you to 100 results/call, so I'm iterating and setting the offset for each loop.
My code was working the last few days, however when I loaded up today and tried to fetch the data, I'm getting the error: "Error in rbind(deparse.level, ...) : invalid list argument: all variables should have the same length"
Here's the code that I'm running:
MarvelUniverse = data.frame()
y = 1
offset = c(0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400)
for(x in offset){
partialUrl = "https://gateway.marvel.com:443/v1/public/characters?ts=1&apikey={apiKey}&hash={hash}&limit=100&offset="
url = paste(partialUrl, offset, sep="")
call = httr::GET(url[y])
query = httr::content(call, as="raw")
name = jsonlite::fromJSON(rawToChar(query))
df = flatten(as.data.frame(name))
MarvelUniverse = rbind(df, MarvelUniverse)
y = y + 1
}
Since I was binding to an empty data frame before, I don't understand why I was able to use this function before, but it's failing now? I did notice that df
is now holding 1618 elements where it was holding ~1498 prior to the break. The name
list is still holding a large list of 7 elements, so that seems to be the same.
EDIT
I found that if I remove flatten from df = flatten(as.data.frame(name))
, the call will execute over the first iteration of the loop, but now I'm running into an issue with duplicate row.names. Trying to set those to NULL, but unfortunately no luck so far.
A few suggestions:
Don't rbind()
data frames one at a time in a loop - this is a classic anti-pattern. It's called "growing an object" and is the 2nd Circle of Hell in the R Inferno. It's very inefficient. Rather, you should put all your data frames in a list, and rbind
them all at once at the end.
You're using two looping variables in the same loop: x
and y
. x
is the one specified in for()
, and y
you are keeping track of manually. This is prone to bugs - just use one looping variable (almost always it's best to have the looping variable go over 1, 2, 3, ..., n)
Addressing these issues, I would try this code:
MarvelUniverseList = list()
offset = c(0, 100, 200, 300, 400, 500, 600, 700, 800, 900, 1000, 1100, 1200, 1300, 1400)
for(i in seq_along(offset)){
partialUrl = "https://gateway.marvel.com:443/v1/public/characters?ts=1&apikey={apiKey}&hash={hash}&limit=100&offset="
url = paste(partialUrl, offset[i], sep="")
call = httr::GET(url[i])
query = httr::content(call, as="raw")
name = jsonlite::fromJSON(rawToChar(query))
MarvelUniverseList[[i]] = flatten(as.data.frame(name))
}
## combine at end
MarvelUniverse = do.call(rbind, MarvelUniverseList)
## more efficient and flexible version from dplyr
MarvelUniverse = dplyr::bind_rows(MarvelUniverseList)
Of course, without an API key, I can't test this or see what your problems are. dplyr::bind_rows
is a bit more flexible than rbind
, so it may solve your problem. But this approach also has the advantage that if there is an issue combining the data frames, you have the individual data frames stored in the list and you can inspect/debug/fix them so that they can be combined.