ggplot aes: alpha gets "smoothed out"

I'm using ggplot in the ggplot2 R package, with the mpg data set.

classify = function(cls){
    if (cls == "suv" || cls == "pickup"){result = 1}
    else {result = 0}
    return(result)
}
mpg = mpg %>% mutate(size = sapply(class, classify))

ggplot(data = mpg) +
geom_point(mapping = aes(x = displ, y = hwy, alpha = size))

Now, size can take only two values: 1 when class is suv or pickup, and 0 otherwise. But I get a weird "smooth" range of sizes in the resulting plot:

(It's not the legend that surprises me, but the fact that there are actually values plotted with alpha 0.1 or 0.3 or whatever.)

What's going on?

Solution

(It's not the legend that surprises me, but the fact that there are actually values plotted with alpha 0.1 or 0.3 or whatever.)

There aren’t. What you’re seeing is that multiple points have the exact same discrete coordinates, and so the semi-transparent points overlap.

And to fix the legend, use factors or character strings (= discrete) instead of numbers (= continuous).

Unrelated, but your classify implementation is pretty unorthodox code. First of all, since R is a functional language, all expressions are values. That means that, rather than performing assignment inside an if you’d usually assign the result of if:

result = if (cls == "suv" || cls == "pickup") 1 else 0

What’s more, there’s no need for the result variable and neither for the return function call (which in R performs early exit). Instead, an idiomatic R implementation would look as follows:

classify = function(cls) {
    if (cls == "suv" || cls == "pickup") 1 else 0
}

Better yet, use vectorised ifelse instead of non-vectorised if:

classify = function(cls) {
    ifelse(cls == "suv" | cls == "pickup", 1, 0)
}

Now you can use classify without the sapply:

mpg = mpg %>% mutate(size = classify(class))