I am trying to create a dataset that lists all combinations of rows by ID, but ignores the order of combinations (i.e. Apple -> Orange is the same as Orange -> Apple). I started by using this question, which gets me all combinations, keeping different orders. How can I modify this code for my desired output? Or should I being using a different approach altogether, and what would the approach be?
Code of what I currently do using linked question:
clear all
input id str10 fruit
1 "Apple"
1 "Banana"
1 "Orange"
2 "Orange"
2 "Apple"
3 "Pear"
3 "Kiwi"
3 "Apple"
3 "Lemon"
end
tempfile t1
save `t1'
clear
use `t1'
rename fruit f2
keep id f2
joinby id using `t1'
order id fruit f2
sort id fruit f2
drop if fruit==f2
list, sepby(id)
My desired output is:
ID fruit f2
1 Apple Banana
1 Apple Orange
1 Banana Orange
2 Orange Apple
3 Pear Kiwi
3 Pear Apple
3 Pear Lemon
3 Kiwi Apple
3 Kiwi Lemon
3 Apple Lemon
After joinby
, you could generate an auxiliary variable which puts fruit and f2 in one variable and sorts them, assuring that the same combinations have the same values. Then you can use duplicates drop
in terms of this variable and id to drop duplicates.
clear
input id str10 fruit
1 "Apple"
1 "Banana"
1 "Orange"
2 "Orange"
2 "Apple"
3 "Pear"
3 "Kiwi"
3 "Apple"
3 "Lemon"
end
tempfile t1
save `t1'
rename fruit f2
joinby id using `t1'
order id fruit f2
sort id fruit f2
drop if fruit==f2
gen combination = cond(fruit < f2, fruit + " " + f2, f2 + " " + fruit)
duplicates drop id combination, force
drop combination
list, sepby(id)
+----------------------+
| id fruit f2 |
|----------------------|
1. | 1 Apple Banana |
2. | 1 Apple Orange |
3. | 1 Banana Orange |
|----------------------|
4. | 2 Apple Orange |
|----------------------|
5. | 3 Apple Kiwi |
6. | 3 Apple Lemon |
7. | 3 Apple Pear |
8. | 3 Kiwi Lemon |
9. | 3 Kiwi Pear |
10. | 3 Lemon Pear |
+----------------------+