I have a dataset sorted by IDs and several fruits. What I want to do is detect all possible combinations of 2 fruits dependent on the ID without repetition (Apple-Banana combination should be the same as Banana-Apple).
As an example:
ID | Fruit |
---|---|
1 | Apple |
1 | Banana |
1 | Blueberry |
2 | Apple |
3 | Orange |
3 | Banana |
3 | Apple |
3 | Blueberry |
What I want to create is:
ID | Combination |
---|---|
1 | Apple Banana |
1 | Apple Blueberry |
1 | Banana Blueberry |
2 | Apple |
3 | Banana Orange |
3 | Apple Orange |
3 | Blueberry Orange |
3 | Apple Banana |
3 | Banana Blueberry |
3 | Apple Blueberry |
The example dataset:
ID <- c(1,1,1,2,3,3,3,3)
Fruit <- c("Apple","Banana","Blueberry","Apple","Orange","Banana","Apple","Blueberry")
dataset <- data.frame(ID, Fruit)
This is for reference.
uniID=unique(dataset$ID)
res=NULL
for (id in 1:length(uniID))
{
sameIDdf=dataset[dataset$ID==id, ]
x=nrow(sameIDdf)
print(x)
if (x>1)
{
comb=t(combn(1:x, 2))
for (i in 1:nrow(comb))
{
res=rbind(res, data.frame(ID=id, Combination=paste(sameIDdf[comb[i,1], 'Fruit'], sameIDdf[comb[i,2], 'Fruit'])))
}
} else
{
res=rbind(res, data.frame(ID=id,Combination=sameIDdf[1,'Fruit']))
}
}
res
Result:
ID Combination
<int> <fct>
1 Apple Banana
1 Apple Blueberry
1 Banana Blueberry
2 Apple
3 Orange Banana
3 Orange Apple
3 Orange Blueberry
3 Banana Apple
3 Banana Blueberry
3 Apple Blueberry