Search code examples
rregressionfrequency-analysis

Understand relationship between multiple dummy variables in R


I have the following data frame:

id       dummy1      dummy2       dummy3      dummy4
2          1           1            1           0  
3          0           0            0           1  
4          1           1            1           0  
5          0           0            1           0

I am trying to come up with a way to see if certain dummy variables appear together more often than others. In this example, if dummy1 is 1, all other dummies are more likely to be 1, too. I tried calculating frequencies, but it becomes very inconvenient for more than two variables.

To give you more context, the dummies stand for different products purchased at a supermarket. I am trying to see if a person who purchases one product (tomatoes, for instance) is more likely to buy a different type (lettuce etc) with it.

Thank you!


Solution

  • I think you are looking for the correlation between variable in your case. It is the easiest approach, it gives you a number between -1 and 1, 1 meaning the two variable are identical (-1 they behave oppositely), and 0 that they are uncorrelated, so they behave independently.

    There is a function for that and it is cor. You can use it directly on your data.frame:

    plouf <- read.table(text = "id       dummy1      dummy2       dummy3      dummy4
    2          1           1            1           0
    3          0           0            0           1
    4          1           1            1           0
    5          0           0            1           0",header = T)
    
    
    cor(plouf[,-1]) 
    
               dummy1     dummy2     dummy3     dummy4
    dummy1  1.0000000  1.0000000  0.5773503 -0.5773503
    dummy2  1.0000000  1.0000000  0.5773503 -0.5773503
    dummy3  0.5773503  0.5773503  1.0000000 -1.0000000
    dummy4 -0.5773503 -0.5773503 -1.0000000  1.0000000
    

    Here dummy1 and dummy2 are identical (correlation equal to 1), and that dummy4/dummy1 tends to behave oppositely dummy1/dummy3 and dummy2/dummy3 tneds to behave similarly.