I am creating a scatter plot using ggplot2 and would like to size my point means proportional to the sample size used to calculate the mean. This is my code, where I used fun.y
to calculate the mean by group Trt
:
branch1 %>%
ggplot() + aes(x=Branch, y=Flow_T, group=Trt, color=Trt) +
stat_summary(aes(group=Trt), fun.y=mean, geom="point", size=)
I am relatively new to R, but my guess is to use size
in the aes
function to resize my points. I thought it might be a good idea to extract the sample sizes used in fun.y=mean
and create a new class that could be inputted into size
, however I am not sure how to do that.
Any help will be greatly appreciated! Cheers.
EDIT
Here's my data for reference:
Plant Branch Pod_B Flow_Miss Pod_A Flow_T Trt Dmg
<int> <dbl> <int> <int> <int> <dbl> <fct> <int>
1 1 1.00 0 16 20 36.0 Early 1
2 1 2.00 0 1 17 18.0 Early 1
3 1 3.00 0 0 17 17.0 Early 1
4 1 4.00 0 3 14 17.0 Early 1
5 1 5.00 5 2 4 11.0 Early 1
6 1 6.00 0 3 7 10.0 Early 1
7 1 7.00 0 4 6 10.0 Early 1
8 1 8.00 0 13 6 19.0 Early 1
9 1 9.00 0 2 7 9.00 Early 1
10 1 10.0 0 2 3 5.00 Early 1
EDIT 2:
Here is a graph of what I'm trying to achieve with proportional sizing by sample size n per Trt
(treatment), where the mean is calculated per Trt
and Branch
number. I'm wondering if I should make Branch
a categorical variable.
If I understood you correctly you would like to scale the size of points based on the number of points per Trt
group.
How about something like this? Note that I appended your sample data, because Trt
contains only Early
entries.
df %>%
group_by(Trt) %>%
mutate(ssize = n()) %>%
ggplot(aes(x = Branch, y = Flow_T, colour = Trt, size = ssize)) +
geom_point();
Explanation: We group by Trt
, then calculate the number of samples per group ssize
, and plot with argument aes(...., size = ssize)
to ensure that the size of points scale with sscale
. You don't need the group
aesthetic here.
To scale points according to the mean of Flow_T
per Trt
we can do:
df %>%
group_by(Trt) %>%
mutate(
ssize = n(),
mean.Flow_T = mean(Flow_T)) %>%
ggplot(aes(x = Branch, y = Flow_T, colour = Trt, size = mean.Flow_T)) +
geom_point();
# Sample data
df <- read.table(text =
"Plant Branch Pod_B Flow_Miss Pod_A Flow_T Trt Dmg
1 1 1.00 0 16 20 36.0 Early 1
2 1 2.00 0 1 17 18.0 Early 1
3 1 3.00 0 0 17 17.0 Early 1
4 1 4.00 0 3 14 17.0 Early 1
5 1 5.00 5 2 4 11.0 Early 1
6 1 6.00 0 3 7 10.0 Early 1
7 1 7.00 0 4 6 10.0 Early 1
8 1 8.00 0 13 6 19.0 Early 1
9 1 9.00 0 2 7 9.00 Early 1
10 1 10.0 0 2 3 5.00 Early 1
11 1 10.0 0 2 3 20.00 Late 1", header = T)