In R I'm stuck on how to create the following table of data and chart. I would like to take this set of data
person_id | Condition |
---|---|
1 | asthma |
1 | heart_disease |
1 | cancer |
2 | cancer |
3 | eczema |
3 | back_pain |
3 | cancer |
4 | asthma |
4 | back_pain |
4 | eczema |
4 | hypertension |
4 | diabetes |
4 | heart_disease |
5 | heart_disease |
6 | asthma |
6 | back_pain |
7 | heart_disease |
10 | asthma |
and organise it so it looks like this
where I have just done it for three medical conditions.
Then I'd like to create a heat map to illustrate the data visually. Something along these lines.
Any help would be gratefully received. Thank you!
To get a heat map showing how often diagnoses occur together you could first left_join
your data on itself by person_id
and second use count
to get the counts. Finally, you could use a geom_tile
to create your heatmap.
Note: I dropped the diagonal row.
library(dplyr, warn = FALSE)
library(ggplot2)
dat |>
left_join(dat,
by = "person_id",
relationship = "many-to-many",
suffix = c("1", "2")
) |>
# Drop diagonal elements
filter(
Condition1 != Condition2, .by = person_id
) |>
count(Condition1, Condition2) |>
ggplot(aes(Condition1, Condition2, fill = n)) +
geom_tile()
DATA
dat <- data.frame(
stringsAsFactors = FALSE,
person_id = c(
1L, 1L, 1L, 2L, 3L, 3L, 3L, 4L,
4L, 4L, 4L, 4L, 4L, 5L, 6L, 6L, 7L, 10L
),
Condition = c(
"asthma", "heart_disease",
"cancer", "cancer", "eczema", "back_pain", "cancer", "asthma",
"back_pain", "eczema", "hypertension", "diabetes",
"heart_disease", "heart_disease", "asthma", "back_pain",
"heart_disease", "asthma"
)
)