I have mutltiple species presence observations from location split by collection time, but would like to have them for whether the species appeared in that location at any time. My data currently looks like this:
### Location Collection_time Species Presence
# loc1 6-8PM Sp1 Y
# loc1 6-8PM Sp2 N
# loc1 8-10PM Sp1 N
# loc1 8-10PM Sp2 Y
# loc1 10-12PM Sp1 N
# loc1 10-12PM Sp2 N
# loc2 6-8PM Sp1 Y
# loc2 6-8PM Sp2 N
# loc2 8-10PM Sp1 N
# loc2 8-10PM Sp2 N
# loc2 10-12PM Sp1 N
# loc2 10-12PM Sp2 N
But what I would like to achieve is to have a new dataframe with one presence absence value by location, not by the collection time, so like:
### Location Species Presence
loc1 Sp1 Y
loc1 Sp2 Y
loc2 Sp1 Y
loc2 Sp2 N
New to R and I don't have a strong enough grasp on it to work out how to achieve this yet, so stuck before the stage where I have reasonably lucid attempts at code. Thanks in advance for help!
A base
R solution
aggregate(Presence ~ Location + Species, df, max, na.rm = T)
# Location Species Presence
# 1 loc1 Sp1 Y
# 2 loc2 Sp1 Y
# 3 loc1 Sp2 Y
# 4 loc2 Sp2 N
You can use max()
because max("Y", "N")
returns "Y"
because of the encoding.