I am using a data frame that includes Pixel ID
, Year
, XY coordinates, and Temperature
values. For some Pixel IDs, the Temperature
value is NA
for each year (see Pixel ID
1). For other pixel IDs, the Temperature
value is 'NA' for only some years (see Pixel ID
2). What I would like is to find and remove those Pixel ID
if all of the values for all of the years (Year
always repeats from 2001-2005) are NA
. I would like to keep those Pixel ID
if even one of the values is a non-NA
value.
Here is an example of the data frame I am working with
> head(Temperature_sorted, n = 10)
# A tibble: 10 x 5
PixelID Year X Y Temperature
<dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2001 70.8 73.5 NA
2 1 2002 70.8 73.5 NA
3 1 2003 70.8 73.5 NA
4 1 2004 70.8 73.5 NA
5 1 2005 70.8 73.5 NA
6 2 2001 70.8 73.5 0.2
7 2 2002 70.8 73.5 0.4
8 2 2003 70.8 73.5 NA
9 2 2004 70.8 73.5 0.5
10 2 2005 70.8 73.5 0.3
and here is an example of the output I would like
> head(Temperature_sorted, n = 10)
# A tibble: 10 x 5
PixelID Year X Y Temperature
<dbl> <dbl> <dbl> <dbl> <dbl>
1 2 2001 70.8 73.5 0.2
2 2 2002 70.8 73.5 0.4
3 2 2003 70.8 73.5 NA
4 2 2004 70.8 73.5 0.5
5 2 2005 70.8 73.5 0.3
6 3 2001 70.8 73.5 NA
7 3 2002 70.8 73.5 0.7
8 3 2003 70.8 73.5 0.9
9 3 2004 70.8 73.5 NA
10 3 2005 70.8 73.5 0.9
I have several thousand Pixel ID
values so I would like to use a for loop if possible.
Using dplyr:
library(dplyr)
Temperature_sorted %>%
group_by(PixelID) %>%
filter(any(!is.na(Temperature)))
First we group rows by PixelID, then for each row, we check if any rows with the same PixelID have a Temperature value that is not NA.