I have a dataset which contains the order, species, and various trait data of species of birds. I am trying to make it so it only keeps data on certain orders. Normally when I am subsetting by species I merge datasets so that only those with the same row names, i.e. species names, are kept in the new data frame, but it appears that when using read.csv() row names cannot be duplicates, and as the Order names would be duplicates, I cannot have them as row names. So how would I subset it so that the new data frame only contains information on select Orders? This is how the columns with Orders and Species look like:
Order1 Species
Acanthagenys_rufogularis Passeriformes Acanthagenys_rufogularis
Acanthiza_apicalis Passeriformes Acanthiza_apicalis
Acanthiza_chrysorrhoa Passeriformes Acanthiza_chrysorrhoa
Acanthiza_lineata Passeriformes Acanthiza_lineata
Acanthiza_nana Passeriformes Acanthiza_nana
Acanthiza_pusilla Passeriformes Acanthiza_pusilla
Acanthiza_reguloides Passeriformes Acanthiza_reguloides
Acanthiza_uropygialis Passeriformes Acanthiza_uropygialis
Acanthorhynchus_tenuirostris Passeriformes Acanthorhynchus_tenuirostris
Accipiter_cirrocephalus Accipitriformes Accipiter_cirrocephalus
While the first 10 lines only contain 2 orders, the dataset contains information on 26 Avian Orders, and I am only interested in Passeriformes, Charadriiformes, Psittaciformes, & Struthioniformes
Assuming your data frame is called bird_df
after you've read it in with read.csv()
, you can subset the data to only contain rows where Order1
is equal to Passeriformes, Charadriiformes, Psittaciformes, or Struthioniformes using:
subset_bird_df <- bird_df[bird_df$Order1 %in% c("Passeriformes", "Charadriiformes", "Psittaciformes", "Struthioniformes"),]