Search code examples
rgraphicsmachine-learningstatisticssurvey

How can I visually represent a contingency table with multiple variables as a decision tree in R?


For example, say I have a respondent and ask if s/he has a disease. From there I ask if her/his father has had the disease. If yes to the latter question, then I ask if the father is now cured. If the father has not had the disease, then the question is not applicable.

Can I create such a "decision tree" in R or else where?

Here is useable data, where 0 means "no", and 1 means "yes":

person_disease <- c(rep(1, 10), rep(0, 20))

father_disease <- c(rep(1, 7), rep(0,18), rep(1,5))

father_cured <- c( rep(0, 4), rep(1,3), rep(NA,18),rep(1,5)  )

##
df <- data.frame(person_disease, father_disease, father_cured)

enter image description here


Solution

  • You can use the data.tree package for that. There are many ways to do what you want. For example:

    person_disease <- c(rep(1, 10), rep(0, 20))
    father_disease <- c(rep(1, 7), rep(0,18), rep(1,5))
    father_cured <- c( rep(0, 4), rep(1,3), rep(NA,18),rep(1,5)  )
    df <- data.frame(person_disease, father_disease, father_cured)
    
    library(data.tree)
    
    #here, the tree is constructed "manually"
    #however, depending on your data and your needs, you might want to generate the tree directly from the data
    #many examples for this are available in the vignettes, see browseVignettes("data.tree")
    disease <- Node$new("Disease", data = df)
    father_disease_yes <- disease$AddChild("Father Disease Yes", label = "Father Disease", edge = "yes", condition = function(df) df[df$person_disease == 1,])
    father_cured_yes <- father_disease_yes$AddChild("Father Cured Yes", label = "Father Cured", edge = "yes", condition = function(df) df[df$father_cured == 1,])
    father_disease_no <- disease$AddChild("Father Disease No", label = "Father Disease", edge = "no", condition = function(df) df[df$person_disease == 0,])
    
    
    #data filter (pre-order)
    #an alternative would be to do this recursively
    disease$Do(function(node) {
      for (child in node$children) {
        child$data <- child$condition(node$data)
      }
    })
    
    print(disease, total = function(node) nrow(node$data))
    
    
    #plotting
    #(many more options are available, see ?plot.Node)
    SetEdgeStyle(disease,
                 fontname = "helvetica",
                 arrowhead = "none",
                 label = function(node) paste0(node$edge, "\n", "total = ", nrow(node$data)))
    
    SetNodeStyle(disease,
                 fontname = "helvetica",
                 label = function(node) node$label)
    
    plot(disease)
    

    enter image description here