first time posting on here. Apologies if I miss including something needed to solve my situation.
I have a matched case-control design where three 'younger' clinical cases have been age-matched to a 'younger' control group, and three 'older' cases have been matched to an 'older' control group. I am attempting to plot the control group distribution in a violin plot and overlay the corresponding matched cases as a data point (PhD supervisors recommending each case has a unique shape and colour for their data point, to assist following the cases throughout the series of violin plots).
My novice solution so far has been to create a data frame for each control group and then individual data frames for the cases. Create plots and add formatting details, e.g., shape, colour of data points.
My code to set up data frames and then example of a plot:
#remove the cases and put into a separate data frame
case_1.1 <- FTD_data[1:1, ]
case_1.2 <- FTD_data[2:2, ]
case_1.3 <- FTD_data[3:3, ]
case_2.1 <- FTD_data[13:13, ]
case_2.2 <- FTD_data[14:14, ]
case_2.3 <- FTD_data[15:15, ]
#remove control groups and put into own group
young_controls <- FTD_data [4:12, ]
old_controls <- FTD_data [16:23, ]
#example plot
ggplot(data = young_controls, aes(x = strange_stories_ToM_mean, y = analysis_group, fill =
analysis_group)) +
geom_point(data=case_1.1, aes(x = strange_stories_ToM_mean, y = analysis_group, colour = "Case
1.1"), fill = "deeppink1", col = "deeppink1", pch = 21, size = 5) +
labs (color = "Young cases") +
geom_point(data=case_1.2, aes(x = strange_stories_ToM_mean, y = analysis_group, colour = "Case
1.2"), fill = "indianred3", col = "indianred3", pch=24, size = 4) +
geom_point(data=case_1.3, aes(x = strange_stories_ToM_mean, y = analysis_group, colour = "Case
1.3"), fill = "blueviolet", col = "blueviolet", pch=22, size = 5,
position=position_jitter(h=0.09,w=0.0)) +
geom_violin(trim = FALSE,
alpha = 0.2,
draw_quantiles = c(0.25, 0.5, 0.75))+
theme_classic() +
scale_fill_manual(values = c("gray90")) +
guides(fill = "none")
One annoying issue I am having though, is where data points from cases overlap (as in plot below). I have tried "position=position_jitter(h=0.09,w=0.0)" but this is moving the data point around each time, as jitter introduces noise. I need something consistent and reproducible for positioning the overlapped points as I will be lining up several plots in a paper. Need to vertically stacked.
Example of plot with overlap issue
I have also tried:
`position_jitter(width = NULL, height = NULL, seed = NA)'
but then receive the following error:
'Error in `check_subclass()`:
! `stat` must be either a string or a Stat object, not an S3 object with class
PositionJitter/Position/ggproto/gg'
Any ideas on the overlap issue? Also, any feedback on how I have set up the data frames and whether I have gone about it in the right way or a cumbersome way! It was the solution that I found easiest to manipulate each data point separately.
Short answer: try position_dodge()
.
Longer answer:
Yes, making separate dataframes for each observation and manually setting aesthetics for each is a bit cumbersome! You generally want to keep values in the same dataframe, then just tell ggplot what dimensions are important and what aesthetics to map these to. In cases where individual observations are important, you can map an aesthetic to a unique subject id.
That said, it can be helpful to use separate dataframes when you want completely different geoms for different subsets -- such as violins for controls and points for cases -- so you were on the right track there.
library(ggplot2)
set.seed(22)
# fake data
cases <- data.frame(
id = factor(1:6),
strange_stories_ToM_mean = sample(6:8, 6, replace = TRUE),
age = factor(c(rep("young", 3), rep("old", 3)))
)
controls <- data.frame(
id = 7:23,
strange_stories_ToM_mean = sample(c(6,6,7,7,7,7,7,7,7,8,8,8,9,9,9,9,9), 17),
age = c(rep("young", 9), rep("old", 8))
)
ggplot(data = controls, aes(strange_stories_ToM_mean, age)) +
geom_violin(
trim = FALSE,
alpha = 0.2,
draw_quantiles = c(0.25, 0.5, 0.75),
fill = "gray90"
) +
geom_point(
data = cases,
aes(colour = id, shape = id), # map color/shape to individual cases
position = position_dodge(width = .2), # spread cases apart to avoid overplotting
size = 5,
show.legend = FALSE
) +
theme_classic()
PS - if you still want to specify particular colors or shapes for each case, you can use scale_color_manual()
and scale_shape_manual()
.