Search code examples
rdataframesubset

Select groups that have + and - z score values


I'm looking to group my data by the column type and select the all rows (of the grouped type) that have BOTH a positive and a negative z-score. In other words, I'm looking to select type groupings only if their z-score ranges above and below 0.

In the example data test I would expect the types "Translation", "Signal Transduction", "Glycan Biosynthesis", and "Genetic Info Processing" to be separated out. I'm really struggling with where to start and below is my best starting point.

out <- test %>% group_by(type) %>% select(zscore < 0 & zscore >0)

> dput(test)
structure(list(zscore = c(0.775767217942226, 1.50431110062286, 
0.96844973768628, 1.13270780763826, -0.417688838401554, -1.16388703634018, 
-0.79027804814387, 0.497003183210411, -0.79027804814387, -1.16388703634018, 
1.37434643916904, -0.79027804814387, -0.79027804814387, -1.16388703634018, 
-0.572244270676755, -1.16388703634018, 0.291606810323695, 0.44589189244246, 
1.1176046488225, -1.16388703634018, 0.21485327846621, -0.417688838401554, 
-1.16388703634018, -0.117261781255063, -1.16388703634018, 1.22796644918627, 
-0.417688838401554, -1.16388703634018, -0.19999370182559, -1.16388703634018
), type = c("Nucleotide Metabolism", "Folding Sorting and Degradation", 
"Cell Motility", "Genetic Information Processing", "Carbohydrate Metabolism", 
"Glycan Biosynthesis and Metabolism", "Carbohydrate Metabolism", 
"Genetic Information Processing", "Metabolism", "Carbohydrate Metabolism", 
"Membrane Transport", "Transport and Catabolism", "Genetic Information Processing", 
"Amino Acid Metabolism", "Transport and Catabolism", "Translation", 
"Translation", "Lipid Metabolism", "Glycan Biosynthesis and Metabolism", 
"Amino Acid Metabolism", "Genetic Information Processing", "Glycan Biosynthesis and Metabolism", 
"Carbohydrate Metabolism", "Metabolism", "Xenobiotics Biodegradation and Metabolism", 
"Signal Transduction", "Genetic Information Processing", "Signal Transduction", 
"Carbohydrate Metabolism", "Metabolism")), row.names = c(1L, 
2L, 3L, 4L, 6L, 7L, 9L, 15L, 17L, 20L, 23L, 24L, 27L, 28L, 32L, 
35L, 36L, 37L, 38L, 39L, 41L, 44L, 45L, 47L, 48L, 49L, 51L, 52L, 
54L, 57L), class = "data.frame")

Solution

  • You should be using filter() rather than select() to select rows. And also use any() to look across all values in a group since any individual value cannot be both negative and positive. Try

    test %>% 
      group_by(type) %>%
      filter(any(zscore < 0) & any(zscore >0))