Search code examples
rggplot2matrixcorrelation

Heatmap of like against like (actuals not correlation coefficients)


In R I'm stuck on how to create the following table of data and chart. I would like to take this set of data

person_id Condition
1 asthma
1 heart_disease
1 cancer
2 cancer
3 eczema
3 back_pain
3 cancer
4 asthma
4 back_pain
4 eczema
4 hypertension
4 diabetes
4 heart_disease
5 heart_disease
6 asthma
6 back_pain
7 heart_disease
10 asthma

and organise it so it looks like this

enter image description here

where I have just done it for three medical conditions.

Then I'd like to create a heat map to illustrate the data visually. Something along these lines. enter image description here

Any help would be gratefully received. Thank you!


Solution

  • To get a heat map showing how often diagnoses occur together you could first left_join your data on itself by person_id and second use count to get the counts. Finally, you could use a geom_tile to create your heatmap.

    Note: I dropped the diagonal row.

    library(dplyr, warn = FALSE)
    library(ggplot2)
    
    dat |>
      left_join(dat,
        by = "person_id", 
        relationship = "many-to-many",
        suffix = c("1", "2")
      ) |>
      # Drop diagonal elements
      filter(
        Condition1 != Condition2, .by = person_id
      ) |>
      count(Condition1, Condition2) |> 
      ggplot(aes(Condition1, Condition2, fill = n)) +
      geom_tile()
    

    DATA

    dat <- data.frame(
      stringsAsFactors = FALSE,
      person_id = c(
        1L, 1L, 1L, 2L, 3L, 3L, 3L, 4L,
        4L, 4L, 4L, 4L, 4L, 5L, 6L, 6L, 7L, 10L
      ),
      Condition = c(
        "asthma", "heart_disease",
        "cancer", "cancer", "eczema", "back_pain", "cancer", "asthma",
        "back_pain", "eczema", "hypertension", "diabetes",
        "heart_disease", "heart_disease", "asthma", "back_pain",
        "heart_disease", "asthma"
      )
    )