Thanks for any help! I have a dataframe in R with two columns of categorical variables, like so:
rowA <- c("Square", "Circle", "Triangle", "Square", "Circle", "Triangle", "Square", "Circle", "Triangle")
rowB <- c("Circle", "Square", "Square", "Square", "Circle", "Circle", "Triangle", "Triangle", "Triangle")
df1 <- data.frame(rowA, rowB)
print(df1)
When we print it, it looks like this:
rowA rowB
1 Square Circle
2 Circle Square
3 Triangle Square
4 Square Square
5 Circle Circle
6 Triangle Circle
7 Square Triangle
8 Circle Triangle
9 Triangle Triangle
I want to count the frequency of each combination of categories in rowA and rowB. Here's what I'm hung up on -- the combinations are reversible, meaning "Square - Circle" is the same as "Circle - Square" for our purposes, and we want them to be summed together. The ideal output would look like this:
Pair Count
Square - Circle 2
Square - Triangle 2
Square - Square 1
Circle - Triangle 2
Circle - Circle 1
Triangle - Triangle 1
I'd be thrilled if anybody had any advice, thanks!
Edit: Post got flagged as a duplicate question, but I don't agree that the suggested posts adequately answered my question (hence I asked in the first place, after a lot of digging). Really appreciate the unique and easy answers here.
We could rearrrange by row with pmin/pmax
and count
library(dplyr)
library(stringr)
df1 %>%
count(Pair = str_c(pmin(rowA, rowB), ' - ',
pmax(rowA, rowB)), name = "Count")
-output
Pair Count
1 Circle - Circle 1
2 Circle - Square 2
3 Circle - Triangle 2
4 Square - Square 1
5 Square - Triangle 2
6 Triangle - Triangle 1