So I'm quite new to R and since googling and browsing questions on here did not help me so far, I decided to write mine down.
For descriptive statistics I would like to have a geom_bar Plot. My data frame consists of 21 IDs and each of them has one or more diagnosis. The IDs are obviously numbers from 1 to 21, the diagnosis is coded as 0 and 1 (no and yes). The code I have so far is plotting bars next to each other, but instead of counting the number of cases per group, it plots the number of people per group. So for each diagnosis I have two bars which always represent the number of people per group (attempters vs. non-attempters) and not the number of cases.
My old data frame looks something like this:
code | MDD | Anxiety | PTBS | age | attempters |
---|---|---|---|---|---|
01 | 0 | 1 | 1 | 17 | 1 |
02 | 1 | 1 | 0 | 53 | 0 |
03 | 0 | 0 | 1 | 32 | 0 |
04 | 0 | 1 | 0 | 60 | 0 |
but with a lot of columns I don't actually need for my thesis.
At first I changed my data from wide to long and included only columns I need:
df_long <- data_gesamt %>%
select(code, MDD, Anxiety, PTBS, attempters) %>%
group_by(code, attempters) %>%
tidyr::gather(key = predictors,
value = severity,
MDD, Anxiety, PTBS) %>%
mutate(attempters = as.factor(attempters)) %>%
drop_na(attempters)
which got me a data frame as follows:
code | attempters | predictors | severity |
---|---|---|---|
01 | 1 | MDD | 0 |
02 | 0 | MDD | 1 |
03 | 0 | MDD | 0 |
04 | 0 | MDD | 0 |
01 | 1 | Anxiety | 1 |
02 | 0 | Anxiety | 1 |
03 | 0 | Anxiety | 0 |
04 | 0 | Anxiety | 1 |
01 | 1 | PTBS | 1 |
02 | 0 | PTBS | 0 |
03 | 0 | PTBS | 1 |
04 | 0 | PTBS | 0 |
01 | 1 | age | 17 |
02 | 0 | age | 53 |
03 | 0 | age | 32 |
04 | 0 | age | 60 |
and then used the following to plot:
plot <- df_long %>%
ggplot(aes(x = attempters, fill = attempters)) +
geom_bar() +
facet_grid(.~ predictors) +
theme(legend.position = "bottom")
plot
I would need to have a count of how many people with MDD, Anxiety and PTBS I have per group and the mean of the age (I could leave this one out though). So far I get the number of people per group (non-attempters vs. attempters) ... What am I missing or what is wrong?
[Updated: revised the part that mistaken the nature of the data]
It is recommended to summarize the data before you graph it so ggplot only deal with the visualization of data instead of some additional calculation. In addition to that, you can also double check if the stat calculated is what you want instead of leave it all to ggplot to do all the calculation behind the scene.
library(dplyr)
library(tidyr)
library(ggplot2)
# dput of your original table
df <- structure(list(code = 1:4, MDD = c(0L, 1L, 0L, 0L), Anxiety = c(1L,
1L, 0L, 1L), PTBS = c(1L, 0L, 1L, 0L), age = c(17L, 53L, 32L,
60L), attempters = c(1L, 0L, 0L, 0L)), row.names = c(NA, -4L),
class = "data.frame")
# Calculate the number per predictor & attemp
graph_data <- df %>%
# pivot the data and only keep records with identified
# predictors has value as 1
pivot_longer(cols = MDD:PTBS,
names_to = "predictors", values_to = "value") %>%
filter(value == 1) %>%
# I convert attempers to factor as it only 0 and 1
# Numeric value confusing with ggplot a bit
group_by(predictors, attempters = factor(attempters)) %>%
summarize(severity = n(),
mean_age = mean(age), .groups = "drop")
# data after summarized
graph_data
#> # A tibble: 5 x 4
#> predictors attempters severity mean_age
#> <chr> <fct> <int> <dbl>
#> 1 Anxiety 0 2 56.5
#> 2 Anxiety 1 1 17
#> 3 MDD 0 1 53
#> 4 PTBS 0 1 32
#> 5 PTBS 1 1 17
# and now plot is taking in graph_data
ggplot(data = graph_data) +
# I prefer to do the mapping per geom instead of at ggplot call.
geom_bar(mapping = aes(
x = attempters,
# here y value is the sum of severity calculated earlier
y = severity,
# when do fill/colors I prefer to explicit specify the group even if it
# can be auto by ggplot
fill = attempters, group = attempters),
# Here stat is identity instead of default count
stat = "identity",
# position_dodge for avoid bar stacked on each other
position = position_dodge()) +
facet_grid(.~ predictors) +
theme(legend.position = "bottom")
Created on 2021-04-17 by the reprex package (v2.0.0)