** There are NAs in the min.range column**
In R, I have a dataframe with a list of species, each has two related columns with numeric values (overall.percentage and min.range). For each species, there are multiple lines (representing different populations). This is a sample of the dataframe:
species<- data.frame(species= c("dog", "dog", "dog", "cat", "cat", "fish", "fish"),
overall.percentage = c(12, 13, 19, 20, 12, 10, 50),
min.range= c(19, 19, 99, 1, 2, NA, 26))
I would like to create a new column that orders all species (not population) according to the lowest value each species has in either of the 2 numeric columns.
The result of the above data will be:
species<- data.frame(species= c("dog", "dog", "dog", "cat", "cat", "fish", "fish"),
overall.percentage = c(12, 13, 19, 20, 12, 10, 50),
min.range= c(19, 19, 99, 1, 2, NA, 26),
order = (3, 3, 3, 1, 1, 2, 2))
(explanation: '"cat" has "1" in "order" as it had the lowest value (i.e. "1") in either overall.percentage and min.range out of all other species. Fish had "2" as it has the second lowest value of "10", and "dog" has "3" as it has the less low value of 12).
I guess I need to: (1) group the dataframe by species (with aggregate function?) (2) select the lowest value within a species (while considering both numeric columns) (3) and then create a new numeric column with a number for each species according to its order within all the species. The end goal will be to use this new "order" column to order the species in ascending order for ggplotling.
I managed to use the aggregate function to find the minimum value within each species according to only one of the parameters, but not according to both: min_by_species <- aggregate ((overall.percentage) ~ species.English.name, data = species, min)
Also, it is not clear to me how to create the "order" column later that will be used for the ggploting.
Here is one option to achieve your desired result using ave()
:
species <- data.frame(
species = c("dog", "dog", "dog", "cat", "cat", "fish", "fish"),
overall.percentage = c(12, 13, 19, 20, 12, 10, 50),
min.range = c(19, 19, 99, 1, 2, 44, 26)
)
species |>
transform(
value = ave(
pmin(overall.percentage, min.range), species,
FUN = min
)
) |>
transform(
order = as.numeric(
reorder(species, value)
)
) |>
subset(select = -value)
#> species overall.percentage min.range order
#> 1 dog 12 19 3
#> 2 dog 13 19 3
#> 3 dog 19 99 3
#> 4 cat 20 1 1
#> 5 cat 12 2 1
#> 6 fish 10 44 2
#> 7 fish 50 26 2