I am exploring data from the Pokemon API (not actually using the API, just pulling the .csv files from the github). In a file that contains the types of every Pokemon in narrow format (a Pokemon can have up to two types) called pokemon_types.csv
, the types are encoded as integers (essentially factors). I want to label these levels by using a lookup table (types.csv
), also from the API, that contains the levels as an id
(1, 2, 3, etc.) and a corresponding identifier
(normal, fighting, flying, etc.) which I want to use as the label.
> head(read_csv(path("pokemon_types.csv")), 10)
# A tibble: 10 x 3
pokemon_id type_id slot
<dbl> <dbl> <dbl>
1 1 12 1
2 1 4 2
3 2 12 1
4 2 4 2
5 3 12 1
6 3 4 2
7 4 10 1
8 5 10 1
9 6 10 1
10 6 3 2
> head(read_csv(path("types.csv")))
# A tibble: 6 x 4
id identifier generation_id damage_class_id
<dbl> <chr> <dbl> <dbl>
1 1 normal 1 2
2 2 fighting 1 2
3 3 flying 1 2
4 4 poison 1 2
5 5 ground 1 2
6 6 rock 1 2
My code works when I pipe all of the steps individually, but since I am going to perform this labeling step at least a dozen times or so I tried to put it into a function. The problem is that when I call the function instead (which has exactly the same steps as far as I can tell) it throws an object not found
error.
The Setup:
library(readr)
library(magrittr)
library(dplyr)
library(tidyr)
options(readr.num_columns = 0)
# Append web directory to filename
path <- function(x) {
paste0("https://raw.githubusercontent.com/",
"PokeAPI/pokeapi/master/data/v2/csv/", x)
}
The offending function:
# Use lookup table to label factor variables
label <- function(data, variable, lookup) {
mutate(data, variable = factor(variable,
levels = read_csv(path(lookup))$id,
labels = read_csv(path(lookup))$identifier))
}
This version, which doesn't use the function, works:
df.types <-
read_csv(path("pokemon_types.csv")) %>%
mutate(type_id = factor(type_id,
levels = read_csv(path("types.csv"))$id,
labels = read_csv(path("types.csv"))$identifier)) %>%
spread(slot, type_id)
head(df.types)
it returns:
# A tibble: 6 x 3
pokemon_id `1` `2`
<dbl> <fct> <fct>
1 1 grass poison
2 2 grass poison
3 3 grass poison
4 4 fire NA
5 5 fire NA
6 6 fire flying
This version, which uses the function, does not:
df.types <-
read_csv(path("pokemon_types.csv")) %>%
label(type_id, "types.csv") %>%
spread(slot, type_id)
it returns:
Error in factor(variable,
levels = read_csv(path(lookup))$id,
labels = read_csv(path(lookup))$identifier) :
object 'type_id' not found
I know that there are several things that may be sub-optimal here (downloading lookup
twice each time for instance) but I am more interested in why a function that seems identical to some written code makes it not work anymore. I am sure I am just making a silly mistake.
Thanks to the helpful comments I was able to learn all about non-standard evaluation and figure out a solution:
label <- function(data, variable, lookup) {
variable <- enquo(variable)
data %>%
mutate(!!variable := factor(!!variable,
levels = read_csv(path(lookup))$id,
labels = read_csv(path(lookup))$identifier))
}
The key features are enquo()
, which acts as a "quasiquote", !!
, which "unquotes" the variable so it can be interpreted through the argument, and :=
, which allows for unquoting on the both sides.
I tried and failed to implement a solution that avoided dplyr
entirely, but at least this works.