Search code examples
rdataframedplyrsubset

How to remove all rows where at least one column does not have a value of 1?


I have a presence/absence matrix of species occurences, which looks like this:

                    coords.x  coords.y  1  2  3
Point 1            -69.07354 -15.76097  0 NA NA
Point 2            -69.91902 -15.86905  1  1 NA
Point 3            -69.90793 -15.79660  0  0 1
Point 4            -69.86849 -15.86500  0  0 NA
Point 5            -69.84020 -15.81637  1  0 NA

I want to extract the coords of the presence points, for the purposes of thresholding species distribution models. Taking the above snippet, it would look like this:

                    coords.x  coords.y  1  2  3
Point 2            -69.91902 -15.86905  1  1 NA
Point 3            -69.90793 -15.79660  0  0 1
Point 5            -69.84020 -15.81637  1  0 NA

Thus far I have tried the following code:

Originally I tried

SpeciesPoints <- filter(Species_Data,"1" = 0, "2" = 0,"3" = 0)

And I got the following error message:

Error in UseMethod("filter") : 
  no applicable method for 'filter' applied to an object of class "c('matrix', 'array', 'double', 'numeric')"

After converting to a dataframe, I tried:

filter(Species_Data,"1" == 1 , "2" == 1,"3" == 1)

But this created an empty dataframe:

[1] coords.x coords.y 1        2        3       
<0 rows> (or 0-length row.names)

I then tried to see if the OR operator would work, as follows:

filter(Species_Data,"1" == 1 | "2" == 1|"3" == 1)

But this did not remove any rows. How do I filter so that so long as there is a single instance of value 1 in columns 1, 2, & 3 are sufficient to keep the row, those rows without a value of 1 in those columns are removed?


Solution

  • You can use

    library(dplyr)
    
    filter(df, if_any(!contains("coords"), ~ .x == 1))
    
    #         coords.x  coords.y 1 2  3
    #Point 2 -69.91902 -15.86905 1 1 NA
    #Point 3 -69.90793 -15.79660 0 0  1
    #Point 5 -69.84020 -15.81637 1 0 NA
    

    Data:

    > dput(df)
    structure(list(coords.x = c(-69.07354, -69.91902, -69.90793, 
    -69.86849, -69.8402), coords.y = c(-15.76097, -15.86905, -15.7966, 
    -15.865, -15.81637), `1` = c(0, 1, 0, 0, 1), `2` = c(NA, 1, 0, 
    0, 0), `3` = c(NA, NA, 1, NA, NA)), class = "data.frame", row.names = c("Point 1", 
    "Point 2", "Point 3", "Point 4", "Point 5"))