Search code examples
roperatorsterminology

How do you read the %in% operator in plain English?


I'm struggling with how to read the %in% operator in R in "plain English" terms. I've seen multiple examples of code for its use, but not a clear explanation of how to read it.

For example, I've found terminology for the pipe operator %>% that suggests to read it as "and then." I'm looking for a similar translation for the %in% operator.

In the book R for Data Science in chapter 5 titled "Data Transformation" there is an example from the flights data set that reads as follows:

The following code finds all flights that departed in November or December:

filter(flights, month == 11 | month == 12)

A useful short-hand for this problem is x %in% y. This will select every row where x is one of the values in y. We could use it to rewrite the code above:

nov_dec <- filter(flights, month %in% c(11, 12))

When I read "a useful short-hand for this problem is x %in% y," and then look at the nov_dec example, it seems like this is to be understood as "select every row where month (x) is one of the values in c(11,12) (y)," which doesn't make sense to me.

However my brain wants to read it as something like, "Look for 11 and 12 in the month column." In this example, it seems like x should be the values of 11 and 12 and the %in% operator is checking if those values are in y which would be the month column. My brain is reading this example from right to left.

However, all of the code examples I've found seem to indicate that this x %in% y should be read left to right and not right to left.

Can anyone help me read the %in% operator in layman's terms please? Examples would be appreciated.


Solution

  • I think your disconnect is understanding how to apply "in" to a vector. You wrote that you want to read it as "Look for 11 and 12 in the month column." You can indeed think of it that way. Your example was:

    nov_dec <- filter(flights, month %in% c(11, 12))
    

    And that could be expressed in plain English as:

    Give me all the flights where one of the values in c(11, 12) is in the month column

    But we could also say that 11 and 12 are "in" the vector c(11, 12). That's what the left-to-right reading would be:

    Give me all the flights whose month is in the vector c(11, 12).

    Or, expressed slightly differently and more verbosely:

    Give me all the flights whose month is equal to one of the values in the vector c(11, 12)

    This is conceptually similar to using a bunch of | operators in a row (month == 11 | month == 12), but it's best not to think of those as exactly equivalent. Instead of explicitly comparing x to every value in y, you're asking the question "is x equal to one of the values in y?" That's different in the same way that saying "please turn off the lights" is different than saying "please walk over to that plate on the wall and pull the little stick on it downwards." It's expressing what you want instead of how to figure it out, which makes your code more readable, and code is read more often than it's written, so that's important!!!

    Now I'm getting way out of my area - again, I don't know what R actually does here - but the underlying method of answering the question might also be different. It might use a binary search algorithm to find out if x is in y.