My goal is to expand
my data.frame in R
to include possible combinations (but not all possible combinations) from a column in R. Similar to the expand.grid
command, but that function gives you all possible combinations, not just what is present.
To start, I need to group by each factor in the 1st column, and keep the information included in column 2. In column 3, I have character strings of the names of 'Animals'. I want to find each possible pair that occur in this column, row by row (but not all possible pairs). For example, if I have 'Dreadwing' and 'Scorcher' in the first two rows - that would be one pair: Dreadwing-Scorcher and it should not include Scorcher-Dreadwing. However, if rows 4 and five are T-Rex and T-Rex, the pair should appear once: T-Rex-T-Rex because T-Rex appears in 2 separate rows of the column 'Animals.' If T-Rex were to appear in 3 separate rows, then the pair should appear 3 times, etc., etc.
Lastly, pairs should expand the data.frame by 2 columns to store the pairs. In other words, the Dreadwing and Scorcher pair should be each in their own separate column, but in the same row.
I have manually put together this picture to show what my output should be from the data.frame I have (note: Area_1 and Area_2 are separated only to fit the results in one screenshot). The left: I have put arrows showing the desired combinations just from the first row, Dreadwing. On the right: the desired result for all of Area_1 and Area_2.
For the desired result, for Area_1, the Dreadwing-Dreadwing pair should not occur bc Dreadwing does not appear in any other row for Area_1. However, T-Rex appears in 2 separate rows, so the combination of T-Rex-T-Rex should be there, as well as the combination of each row of T-Rex combining with each row of Waterwing. So, 4 T-Rex-Waterwing combinations.
Reproducible Data
Creating the data.frame
v <- c(rep("Area_1", 7), rep("Area_2", 7))
w <- c(rep("Forest", 7), rep("Cave", 7))
y <- c("Waterwing", "Scorcher", "Snapmaw", "T-Rex", "T-Rex", "Dreadwing",
"Waterwing", "Snake", "T-Rex", "T-Rex", "Dreadwing", "Snapmaw", "Scorcher",
"Waterwing")
stack_df <- data.frame(Area = v, Location = w, Animals = y)
stack_df <- stack_df[order(stack_df$Area, stack_df$Location, stack_df$Animals), ]
row.names(stack_df) <- 1:nrow(stack_df)
Using the tidyR guidebook, I have found that the command expand
in conjunction with the nesting
command (to keep only combinations that appear in the data) does not work. For example:
library(tidyr)
stack_df %>%
dplyr::group_by(Area) %>%
expand(nesting(Location, Animals, Animals))
will return only 11/14 rows.
I have tried multiple ways using the expand
and crossing
command. However, like the expand.grid
command, these commands give you all possible combinations.
Despite this, using the expand
command is the closest I have gotten to what I am aiming for.
stack_df %>%
dplyr::group_by(Area) %>%
expand(Location, Animals, Animals)
As you can see, all possibilities are included, which is not the desired result.
Any ideas on how I can get this done?
It sounds/looks to me like you want to find all combinations (within the Area/Location group) of pairs of Animals where the 1st Animal in the pair occurs on a row before the 2nd Animal in the pair.
We can do this by adding a row number index and doing a self-join with an inequality constraint on the row numbers. (This requires dplyr
version >= 1.1.0)
library(dplyr)
stack_df = stack_df |>
mutate(group_i = row_number(), .by = c(Area, Location))
stack_df |>
inner_join(
stack_df,
by = join_by(Area, Location, group_i < group_i),
suffix = c("..2", "..3")
) |>
select(-starts_with("group_"))
# Area Location Animals..2 Animals..3
# 1 Area_1 Forest Dreadwing Scorcher
# 2 Area_1 Forest Dreadwing Snapmaw
# 3 Area_1 Forest Dreadwing T-Rex
# 4 Area_1 Forest Dreadwing T-Rex
# 5 Area_1 Forest Dreadwing Waterwing
# 6 Area_1 Forest Dreadwing Waterwing
# 7 Area_1 Forest Scorcher Snapmaw
# 8 Area_1 Forest Scorcher T-Rex
# 9 Area_1 Forest Scorcher T-Rex
# 10 Area_1 Forest Scorcher Waterwing
# 11 Area_1 Forest Scorcher Waterwing
# 12 Area_1 Forest Snapmaw T-Rex
# 13 Area_1 Forest Snapmaw T-Rex
# 14 Area_1 Forest Snapmaw Waterwing
# 15 Area_1 Forest Snapmaw Waterwing
# 16 Area_1 Forest T-Rex T-Rex
# 17 Area_1 Forest T-Rex Waterwing
# 18 Area_1 Forest T-Rex Waterwing
# 19 Area_1 Forest T-Rex Waterwing
# 20 Area_1 Forest T-Rex Waterwing
# 21 Area_1 Forest Waterwing Waterwing
# 22 Area_2 Cave Dreadwing Scorcher
# 23 Area_2 Cave Dreadwing Snake
# 24 Area_2 Cave Dreadwing Snapmaw
# 25 Area_2 Cave Dreadwing T-Rex
# 26 Area_2 Cave Dreadwing T-Rex
# 27 Area_2 Cave Dreadwing Waterwing
# 28 Area_2 Cave Scorcher Snake
# 29 Area_2 Cave Scorcher Snapmaw
# 30 Area_2 Cave Scorcher T-Rex
# 31 Area_2 Cave Scorcher T-Rex
# 32 Area_2 Cave Scorcher Waterwing
# 33 Area_2 Cave Snake Snapmaw
# 34 Area_2 Cave Snake T-Rex
# 35 Area_2 Cave Snake T-Rex
# 36 Area_2 Cave Snake Waterwing
# 37 Area_2 Cave Snapmaw T-Rex
# 38 Area_2 Cave Snapmaw T-Rex
# 39 Area_2 Cave Snapmaw Waterwing
# 40 Area_2 Cave T-Rex T-Rex
# 41 Area_2 Cave T-Rex Waterwing
# 42 Area_2 Cave T-Rex Waterwing