Search code examples
rtimerowsdata-extraction

Extract the rows based on the specific values in the column by time


I have a data frame looks like :

Date v1 v2
2000/8/10 S 2600
2000/8/10 S 500
2000/8/10 S 3500
2001/11/12 B 2000
2001/11/22 B 1000
2001/11/22 S 1000
2001/12/06 B 1000
2001/12/06 B 1500
2001/12/06 B 1000
2001/12/07 S 2000
2001/12/07 B 4000

However, I'd like to extract the rows which has "B" and "S" in the same date It looks as below :

Date v1 v2
2001/11/22 B 1000
2001/11/22 S 1000
2001/12/07 S 2000
2001/12/07 B 4000

Does anyone have ideas? That would be grateful ! Cheers


Solution

  • You can select the Date which has both 'S' and 'B' in them.

    library(dplyr)
    
    df %>%
      group_by(Date) %>%
      filter(all(c('S', 'B') %in% v1)) %>%
      ungroup
    
    #  Date       v1       v2
    #  <chr>      <chr> <int>
    #1 2001/11/22 B      1000
    #2 2001/11/22 S      1000
    #3 2001/12/07 S      2000
    #4 2001/12/07 B      4000
    

    This logic can also be implemented in base R and data.table :

    #Base R
    subset(df, as.logical(ave(v1, Date, FUN = function(x) all(c('S', 'B') %in% x))))
    
    #data.table
    library(data.table)
    setDT(df)[, .SD[all(c('S', 'B') %in% v1)], Date]
    

    data

    df <- structure(list(Date = c("2000/8/10", "2000/8/10", "2000/8/10", 
    "2001/11/12", "2001/11/22", "2001/11/22", "2001/12/06", "2001/12/06", 
    "2001/12/06", "2001/12/07", "2001/12/07"), v1 = c("S", "S", "S", 
    "B", "B", "S", "B", "B", "B", "S", "B"), v2 = c(2600L, 500L, 
    3500L, 2000L, 1000L, 1000L, 1000L, 1500L, 1000L, 2000L, 4000L
    )), row.names = c(NA, -11L), class = "data.frame")