Search code examples
rselectfiltersubsetunique

In R, how do I select/subset sites with values greater than a value, but then keep all sites that contain values less than the selected value?


I have the following data (please let me know if the link doesn't work; it's my first time uploading to github):

https://github.com/scottr2012/test_r_data/blob/master/2017_Annual_Averages_ALL.csv

I have some data that has values for ANC. I need to select where any of the SITES have ANC > 150, but keep all years of that SITE, even if the ANC is below 150. Currently the code below removes some of the values (and years) below 150. I need all SITES where any of the years has ANC above 150. This code currently seems to only make a list of unique sites (where ANC >150 at any point), but doesn't bring over the rest of the data.

vtsss <- mydata[ which(mydata$PROGRAM=='VTSSS' & mydata$ANC >= 150), ] # Pick a subset, in this case, VTSSS

unique_vtsss <- unique(vtsss$SITE)

vtsss2 <- mydata[ which(mydata[unique_vtsss]), ] 

I get the following error:

Error in `[.data.frame`(mydata, unique_vtsss) : 
  undefined columns selected

Here's where I subset the data but it still will remove some years with ANC less than 150.

vtsss <- subset(mydata, PROGRAM == 'VTSSS' & ANC >= 150, 
select=c(PROGRAM, SITE, YEAR, ANC))

Solution

  • I created a small example of data which resembles your csv and I think that subsequent code does what you are asking:

    PROGRAM <- c('VTSSS', 'VTSSS', 'VTSSS', 'VTSSS', 'VTSSS', 'VTSSS','VTSSS','VTSSS','other') 
    SITE <- c("A", "A", "A", "B", "B", "B", "C", "C", "C") 
    YEAR <- c(2018, 2019, 2020, 2018, 2019, 2020, 2018, 2019, 2020) 
    ANC <- c(1, 1, 1, 160, 160, 160, 1, 160, 160)
    mydata <- data.frame(PROGRAM, SITE, YEAR, ANC)
    
    vtsss <- mydata[ which(mydata$PROGRAM =='VTSSS'), ]
    vtsss2 <- vtsss[ which(vtsss$ANC >= 150), ]
    vtsss2 <- subset(vtsss2, !duplicated(vtsss2$SITE))
    vtsss3 <- vtsss[ which(vtsss$SITE %in% vtsss2$SITE), ]