I'm using ggplot
in the ggplot2
R package, with the mpg
data set.
classify = function(cls){
if (cls == "suv" || cls == "pickup"){result = 1}
else {result = 0}
return(result)
}
mpg = mpg %>% mutate(size = sapply(class, classify))
ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = size))
Now, size
can take only two values: 1 when class is suv
or pickup
, and 0 otherwise. But I get a weird "smooth" range of sizes in the resulting plot:
(It's not the legend that surprises me, but the fact that there are actually values plotted with alpha 0.1 or 0.3 or whatever.)
What's going on?
(It's not the legend that surprises me, but the fact that there are actually values plotted with alpha 0.1 or 0.3 or whatever.)
There aren’t. What you’re seeing is that multiple points have the exact same discrete coordinates, and so the semi-transparent points overlap.
And to fix the legend, use factors or character strings (= discrete) instead of numbers (= continuous).
Unrelated, but your classify
implementation is pretty unorthodox code. First of all, since R is a functional language, all expressions are values. That means that, rather than performing assignment inside an if
you’d usually assign the result of if
:
result = if (cls == "suv" || cls == "pickup") 1 else 0
What’s more, there’s no need for the result
variable and neither for the return
function call (which in R performs early exit). Instead, an idiomatic R implementation would look as follows:
classify = function(cls) {
if (cls == "suv" || cls == "pickup") 1 else 0
}
Better yet, use vectorised ifelse
instead of non-vectorised if
:
classify = function(cls) {
ifelse(cls == "suv" | cls == "pickup", 1, 0)
}
Now you can use classify
without the sapply
:
mpg = mpg %>% mutate(size = classify(class))