I have a dataset that looks like this:
number fruit status
1 1 apple ripe
2 2 apple rotten
3 3 banana ripe
4 4 banana rotten
5 5 pear ripe
6 6 pear rotten
7 7 apple ripe
data.frame(number = 1:7,
fruit = c(rep(c("apple","banana","pear"), each = 2),"apple"),
status =c(rep(c("ripe", "rotten"),3),"ripe"))
I would like to loop over "fruit" and return the levels of "status" for each fruit. That is, get out something like this:
$apple
[1] ripe rotten
$banana
[2] ripe rotten
$pear
[3] ripe rotten
It doesn't have to be a list; I just need to know the levels within each "fruit" level. My data is more complicated than the example so assume I can't just remove the "number" column
I am trying to use apply functions or dplyr and I can't figure out how to get this.
1) tapply/unique Assuming only unique values of status
are wanted this base R solution could be used:
with(DF, tapply(as.character(status), fruit, unique, simplify = FALSE))
giving:
$apple
[1] "ripe" "rotten"
$banana
[1] "ripe" "rotten"
$pear
[1] "ripe" "rotten"
2) split If it were known that the sublevels of each level are already unique then this base R solution would be sufficient and gives the same result.
with(DF, split(as.character(status), fruit))
3) table Another form of output that might be useful is a table showing the number of occurrences of each sublevel within in each level. Again this uses only base R.
m <- table(DF[-1])
m
giving:
status
fruit ripe rotten
apple 1 1
banana 1 1
pear 1 1
We can create a bipartite graph of this using the igraph package:
library(igraph)
g <- graph_from_incidence_matrix(m)
plot(g, layout = layout_as_bipartite)