Search code examples
rfor-loopgeo

How do I find and remove row chunks that all have missing data in an R data.frame?


I am using a data frame that includes Pixel ID, Year, XY coordinates, and Temperature values. For some Pixel IDs, the Temperature value is NA for each year (see Pixel ID 1). For other pixel IDs, the Temperature value is 'NA' for only some years (see Pixel ID 2). What I would like is to find and remove those Pixel ID if all of the values for all of the years (Year always repeats from 2001-2005) are NA. I would like to keep those Pixel ID if even one of the values is a non-NA value.

Here is an example of the data frame I am working with

> head(Temperature_sorted, n = 10)
# A tibble: 10 x 5
   PixelID  Year     X     Y Temperature
     <dbl> <dbl> <dbl> <dbl>       <dbl>
 1       1  2001  70.8  73.5          NA
 2       1  2002  70.8  73.5          NA
 3       1  2003  70.8  73.5          NA
 4       1  2004  70.8  73.5          NA
 5       1  2005  70.8  73.5          NA
 6       2  2001  70.8  73.5         0.2
 7       2  2002  70.8  73.5         0.4
 8       2  2003  70.8  73.5          NA
 9       2  2004  70.8  73.5         0.5
10       2  2005  70.8  73.5         0.3

and here is an example of the output I would like

> head(Temperature_sorted, n = 10)
# A tibble: 10 x 5
   PixelID  Year     X     Y Temperature
     <dbl> <dbl> <dbl> <dbl>       <dbl>
 1       2  2001  70.8  73.5         0.2
 2       2  2002  70.8  73.5         0.4
 3       2  2003  70.8  73.5          NA
 4       2  2004  70.8  73.5         0.5
 5       2  2005  70.8  73.5         0.3
 6       3  2001  70.8  73.5          NA
 7       3  2002  70.8  73.5         0.7
 8       3  2003  70.8  73.5         0.9
 9       3  2004  70.8  73.5          NA
10       3  2005  70.8  73.5         0.9

I have several thousand Pixel ID values so I would like to use a for loop if possible.


Solution

  • Using dplyr:

    library(dplyr)
    
    Temperature_sorted %>%
      group_by(PixelID) %>%
      filter(any(!is.na(Temperature)))
    

    First we group rows by PixelID, then for each row, we check if any rows with the same PixelID have a Temperature value that is not NA.