Search code examples
raggregate

How to aggregate multiple observations of presence/absence data within a location in R?


I have mutltiple species presence observations from location split by collection time, but would like to have them for whether the species appeared in that location at any time. My data currently looks like this:

### Location   Collection_time   Species   Presence
#    loc1        6-8PM             Sp1        Y
#    loc1        6-8PM             Sp2        N
#    loc1        8-10PM            Sp1        N
#    loc1        8-10PM            Sp2        Y
#    loc1        10-12PM           Sp1        N
#    loc1        10-12PM           Sp2        N
#    loc2        6-8PM             Sp1        Y
#    loc2        6-8PM             Sp2        N
#    loc2        8-10PM            Sp1        N
#    loc2        8-10PM            Sp2        N
#    loc2        10-12PM           Sp1        N
#    loc2        10-12PM           Sp2        N

But what I would like to achieve is to have a new dataframe with one presence absence value by location, not by the collection time, so like:

### Location  Species   Presence
     loc1      Sp1          Y 
     loc1      Sp2          Y 
     loc2      Sp1          Y 
     loc2      Sp2          N 

New to R and I don't have a strong enough grasp on it to work out how to achieve this yet, so stuck before the stage where I have reasonably lucid attempts at code. Thanks in advance for help!


Solution

  • A base R solution

    aggregate(Presence ~ Location + Species, df, max, na.rm = T)
    
    #   Location Species Presence
    # 1     loc1     Sp1        Y
    # 2     loc2     Sp1        Y
    # 3     loc1     Sp2        Y
    # 4     loc2     Sp2        N
    

    You can use max() because max("Y", "N") returns "Y" because of the encoding.