I have the following data (please let me know if the link doesn't work; it's my first time uploading to github):
https://github.com/scottr2012/test_r_data/blob/master/2017_Annual_Averages_ALL.csv
I have some data that has values for ANC. I need to select where any of the SITES have ANC > 150, but keep all years of that SITE, even if the ANC is below 150. Currently the code below removes some of the values (and years) below 150. I need all SITES where any of the years has ANC above 150. This code currently seems to only make a list of unique sites (where ANC >150 at any point), but doesn't bring over the rest of the data.
vtsss <- mydata[ which(mydata$PROGRAM=='VTSSS' & mydata$ANC >= 150), ] # Pick a subset, in this case, VTSSS
unique_vtsss <- unique(vtsss$SITE)
vtsss2 <- mydata[ which(mydata[unique_vtsss]), ]
I get the following error:
Error in `[.data.frame`(mydata, unique_vtsss) :
undefined columns selected
Here's where I subset the data but it still will remove some years with ANC less than 150.
vtsss <- subset(mydata, PROGRAM == 'VTSSS' & ANC >= 150,
select=c(PROGRAM, SITE, YEAR, ANC))
I created a small example of data which resembles your csv and I think that subsequent code does what you are asking:
PROGRAM <- c('VTSSS', 'VTSSS', 'VTSSS', 'VTSSS', 'VTSSS', 'VTSSS','VTSSS','VTSSS','other')
SITE <- c("A", "A", "A", "B", "B", "B", "C", "C", "C")
YEAR <- c(2018, 2019, 2020, 2018, 2019, 2020, 2018, 2019, 2020)
ANC <- c(1, 1, 1, 160, 160, 160, 1, 160, 160)
mydata <- data.frame(PROGRAM, SITE, YEAR, ANC)
vtsss <- mydata[ which(mydata$PROGRAM =='VTSSS'), ]
vtsss2 <- vtsss[ which(vtsss$ANC >= 150), ]
vtsss2 <- subset(vtsss2, !duplicated(vtsss2$SITE))
vtsss3 <- vtsss[ which(vtsss$SITE %in% vtsss2$SITE), ]