Search code examples
rdataframegroup-bycombinations

Finding Combinations of two characters by id in a dataset in R


I have a dataset sorted by IDs and several fruits. What I want to do is detect all possible combinations of 2 fruits dependent on the ID without repetition (Apple-Banana combination should be the same as Banana-Apple).

As an example:

ID Fruit
1 Apple
1 Banana
1 Blueberry
2 Apple
3 Orange
3 Banana
3 Apple
3 Blueberry

What I want to create is:

ID Combination
1 Apple Banana
1 Apple Blueberry
1 Banana Blueberry
2 Apple
3 Banana Orange
3 Apple Orange
3 Blueberry Orange
3 Apple Banana
3 Banana Blueberry
3 Apple Blueberry

The example dataset:

ID <- c(1,1,1,2,3,3,3,3)
Fruit <- c("Apple","Banana","Blueberry","Apple","Orange","Banana","Apple","Blueberry")
dataset <- data.frame(ID, Fruit)

Solution

  • This is for reference.

    uniID=unique(dataset$ID)
    res=NULL
    for (id in 1:length(uniID))
    {
        sameIDdf=dataset[dataset$ID==id, ]
        x=nrow(sameIDdf)
        print(x)
        if (x>1)
        {
           comb=t(combn(1:x, 2))
           for (i in 1:nrow(comb))
           {
             res=rbind(res, data.frame(ID=id, Combination=paste(sameIDdf[comb[i,1], 'Fruit'], sameIDdf[comb[i,2], 'Fruit'])))
           }
        } else
        {
            res=rbind(res, data.frame(ID=id,Combination=sameIDdf[1,'Fruit']))
        }    
    }
    res
    

    Result:

    ID  Combination
    <int>   <fct>
    1   Apple Banana
    1   Apple Blueberry
    1   Banana Blueberry
    2   Apple
    3   Orange Banana
    3   Orange Apple
    3   Orange Blueberry
    3   Banana Apple
    3   Banana Blueberry
    3   Apple Blueberry