newbie here! After a long search I still could not find a satisfying solution to my problem. I have a dataset of heart failure rates (https://archive.ics.uci.edu/ml/datasets/Heart+failure+clinical+records) and I would like to display a series of geom plot where the "Sruvived" and "Dead" are counted per category (i.e. sex, smoking and so on).
I think i have done a decent job at preparing the plots, and they look right to me. The problem is, it is difficult to see the how the ratio between surviving and dying patient with different characteristics is.
I have two but both of them elude me:
Here is the code I wrote.
library(ggplot)
heart_faliure_data <- read.csv(file = "heart_failure_clinical_records_dataset.csv", header = FALSE, skip=1)
#Prepare Column Names
c_names <- c("Age",
"Anaemia",
"creatinine_phosphokinase",
"diabetes",
"ejection_fraction",
"high_blood_pressure",
"platelets",
"serum_creatinine",
"serum_sodium",
"sex",
"smoking",
"time",
"DEATH_EVENT")
#Apply column names to the dataframe
colnames(heart_faliure_data) <- c_names
# Some Classes like sex, Anaemia, diabetes, high_blood_pressure smoking and DEATH_EVENT are booleans
# (see description of Dataset) and should be transformed into factors
heart_faliure_data$sex <- factor(heart_faliure_data$sex,
levels=c(0,1),
labels=c("Female","Male"))
heart_faliure_data$smoking <- factor(heart_faliure_data$smoking,
levels=c(0,1),
labels=c("No","Yes"))
heart_faliure_data$DEATH_EVENT <- factor(heart_faliure_data$DEATH_EVENT,
levels=c(0,1),
labels=c("Survived","Died"))
heart_faliure_data$high_blood_pressure <- factor(heart_faliure_data$high_blood_pressure,
levels=c(0,1),
labels=c("No","Yes"))
heart_faliure_data$Anaemia <- factor(heart_faliure_data$Anaemia,
levels=c(0,1),
labels=c("No","Yes"))
heart_faliure_data$diabetes <- factor(heart_faliure_data$diabetes,
levels=c(0,1),
labels=c("No","Yes"))
# Adjust Age to a int value
heart_faliure_data$Age <- as.integer(heart_faliure_data$Age)
# selecting the categorical variables and study the effect of each variable on death-event
categorical.heart_failure <- heart_faliure_data %>%
select(Anaemia,
diabetes,
high_blood_pressure,
sex,
smoking,
DEATH_EVENT) %>%
gather(key = "key", value = "value", -DEATH_EVENT)
#Visualizing this effect with a grouped barplot
categorical.heart_failure %>%
ggplot(aes(value)) +
geom_bar(aes(x = value,
fill = DEATH_EVENT),
alpha = .2,
position = "dodge",
color = "black",
width = .7,
stat = "count") +
labs(x = "",
y = "") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
facet_wrap(~ key,
scales = "free",
nrow = 4) +
scale_fill_manual(values = c("#FFA500", "#0000FF"),
name = "Death Event",
labels = c("Survived", "Dead"))
And here is a (not so bad) image of the result:
The goal would be to have some numerical value on top of the bars. Or even just a a y indication...
I would be glad about any help you can give me!
What about something like this. To make it work, I aggregated the data first:
tmp <- categorical.heart_failure %>%
group_by(DEATH_EVENT, key, value) %>%
summarise(n = n())
#Visualizing this effect with a grouped barplot
tmp %>%
ggplot(aes(x = value, y=n)) +
geom_bar(aes(fill = DEATH_EVENT),
alpha = .2,
position = position_dodge(width=1),
color = "black",
width = .7,
stat = "identity") +
geom_text(aes(x=value, y=n*1.1, label = n, group=DEATH_EVENT), position = position_dodge(width=1), vjust=0) +
labs(x = "",
y = "") +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank()) +
facet_wrap(~ key,
scales = "free",
nrow = 4) +
scale_fill_manual(values = c("#FFA500", "#0000FF"),
name = "Death Event",
labels = c("Survived", "Dead")) +
coord_cartesian(ylim=c(0, max(tmp$n)*1.25))