Search code examples
rdataframeconditional-statementssubset

Automatically filter data by two conditions


Let's say we have data from different schools with different classes, like this:

df <- data.frame(school= rep(c("Oldenburg", "Berlin", "Homburg"), each= 4), class= rep(LETTERS[1:4], 3))

Now I need to subset the data, for example we may need the data from school Berlin class B and school Oldenburg class A. What we can do is

subset(df, school == "Berlin" & class == "B" | school== "Oldenburg" & class == "A")

But in other situations we need other classes of other schools and there might be some new school in the data or new class, too. Therefore I can not always actually write the names of the schools and classes by hand but need to subset the data automatically by two conditions.

My attempt is to provide two vectors of same length, one saying the name of the schools and the other the name of the classes. For the example above it would be schools <- c("Berlin", "Oldenburg") and classes <- c("B", "A"). I couldn't find any solution for this.. Any ideas?


Solution

  • I suggest creating a data.frame with the pairwise schools/classes you need to filter on, and then do an inner-join on them.

    need <- data.frame(school = c("Berlin", "Oldenburg"), class = c("B", "A"))
    need
    #      school class
    # 1    Berlin     B
    # 2 Oldenburg     A
    
    ### for kicks, just to show that other things in `df` are retained
    df$num <- 1:nrow(df)
    merge(df, need, by = c("school", "class"))
    #      school class num
    # 1    Berlin     B   6
    # 2 Oldenburg     A   1
    

    One reason I prefer making a frame with these is that it's often easier to maintain: it's a frame, which can be kept as a CSV or similar for easy updates, cross-platform as well. It allows for easy frame-ops as well, for instance if you want (say) just one row's worth, them merge(df, need[2,], ...) or similar.

    For joins, see: