Search code examples
rif-statementrowsdata-manipulation

Filtering a Data Frame based on Row Conditions


I thought of the following example to illustrate my question.

Suppose there are 5 balls:

  • Red
  • Blue
  • Green
  • Yellow
  • Orange

enter image description here

Normally, there are 5! = 120 ways these balls can be organized (n!). I can enumerate all these combinations below:

library(combinat)
library(dplyr)

my_list = c("Red", "Blue", "Green", "Yellow", "Orange")

d = permn(my_list)

all_combinations  = as.data.frame(matrix(unlist(d), ncol = 120)) %>%
  setNames(paste0("col", 1:120))


all_combinations[,1:5]

    col1   col2   col3   col4   col5
1    Red    Red    Red    Red Orange
2   Blue   Blue   Blue Orange    Red
3  Green  Green Orange   Blue   Blue
4 Yellow Orange  Green  Green  Green
5 Orange Yellow Yellow Yellow Yellow

My Question:

Suppose I wanted to filter this list by the following conditions:

  • The "Red" Ball can either be in the first or second position (From Left to Right)
  • There must be at least 2 positions between the "Blue" Ball and the "Green" Ball
  • The "Yellow" Ball can not be in the last position

I then tried to filter the above data based on these 3 conditons:

# attempt to write first condition
    cond_1 <- all_combinations[which(all_combinations[1,]== "Red" || all_combinations[2,] == "Red"), ]

#not sure how to write the second condition
    
 # attempt to write the third condition   
    cond_3 <- data_frame_version[which(data_frame_version[5,] !== "Yellow" ), ]

# if everything worked, an "anti join" style statement could be written to remove "cond_1, cond_2, cond_3" from the original data?

But this is not working at all - the first and third condition return a data frame only containing 4 rows for all the columns.

Can someone please show me how to correctly filter "all_combinations" using the above 3 filters?

Note:

The following code can transpose the original data:

 library(data.table)

    tpose = transpose(all_combinations)

    df = tpose
    
#group every 5 rows by the same id to identify unique combinations

    bloc_len <- 5
    
    df$bloc <- 
        rep(seq(1, 1 + nrow(df) %/% bloc_len), each = bloc_len, length.out = nrow(df))
    
   
 head(df)

      V1     V2     V3     V4     V5 bloc
1    Red   Blue  Green Yellow Orange    1
2    Red   Blue  Green Orange Yellow    1
3    Red   Blue Orange  Green Yellow    1
4    Red Orange   Blue  Green Yellow    1
5 Orange    Red   Blue  Green Yellow    1
6 Orange    Red   Blue Yellow  Green    2

Solution

  • You can do:

    library(tidyverse)
    tpose %>%
      mutate(blue_delete = case_when(V1 == "Blue" & V2 == "Green" ~ TRUE,
                                     V1 == "Blue" & V3 == "Green" ~ TRUE,
                                     V2 == "Blue" & V3 == "Green" ~ TRUE,
                                     V3 == "Blue" & V4 == "Green" ~ TRUE,
                                     V4 == "Blue" & V5 == "Green" ~ TRUE,
                                     TRUE ~ FALSE)) %>%
      filter(V3 != "Red" & V4 != "Red" & V5 != "Red",
             V5 != "Yellow",
             blue_delete == FALSE) %>%
      select(-blue_delete)