Search code examples
rtibble

how do you separate a tibble based on a condition in R?


Say, I have the following tibble;

df <- tibble(name = c("CTX_M", "CblA_1", "OXA_1", "ampC"),
             rpkm = c(350, 4, 0, 0))

and I want to split the tibble into one where rpkm = 0, and a second where rpkm > 0.

I've tried to create a function to select the rows where rpkm = 0, as follows

zero <- function(data){
  input = data
  if(input[, 2] == 0){
    n = input
    print(n)
  }
}

but I get the following error when I try to run it like this

Zero <- zero(df)

Warning message:
In if (input[, 2] == 0) { :
  the condition has length > 1 and only the first element will be used

As I'm not so good with R, I'm not sure what is going wrong, or how to approach this?


Solution

  • Alternatively, you can use a handy package called 'dplyr' that's part of a family of packages from the tidyverse. They have a lot of handy functions for working with data.

    #library of interest
    library(dplyr)
    
    ##Your data
    df <- tibble(name = c("CTX_M", "CblA_1", "OXA_1", "ampC"),
                 rpkm = c(350, 4, 0, 0))
    
    ##Using the filter function to get all = 0
    df_filt1 <- df %>% 
      filter(rpkm == 0)
    
    ##see what the filtering looks like
    df_filt1
    
    # A tibble: 2 x 2
      name   rpkm
      <chr> <dbl>
    1 OXA_1     0
    2 ampC      0
    
    ##Using the filter function to get all > 0
    df_filt2 <- df %>% 
      filter(rpkm > 0)
    
    ##see what the filtering looks like
    df_filt2
    
    # A tibble: 2 x 2
      name    rpkm
      <chr>  <dbl>
    1 CTX_M    350
    2 CblA_1     4