Search code examples
rggplot2sizepointsample

Sizing scatter plot point mean proportional to sample size


I am creating a scatter plot using ggplot2 and would like to size my point means proportional to the sample size used to calculate the mean. This is my code, where I used fun.y to calculate the mean by group Trt:

branch1 %>%
ggplot() + aes(x=Branch, y=Flow_T, group=Trt, color=Trt) +
stat_summary(aes(group=Trt), fun.y=mean, geom="point", size=)

I am relatively new to R, but my guess is to use size in the aes function to resize my points. I thought it might be a good idea to extract the sample sizes used in fun.y=mean and create a new class that could be inputted into size, however I am not sure how to do that.

Any help will be greatly appreciated! Cheers.

EDIT

Here's my data for reference:

Plant Branch Pod_B Flow_Miss Pod_A Flow_T Trt     Dmg
<int>  <dbl> <int>     <int> <int>  <dbl> <fct> <int>
1     1   1.00     0        16    20  36.0  Early     1
2     1   2.00     0         1    17  18.0  Early     1
3     1   3.00     0         0    17  17.0  Early     1
4     1   4.00     0         3    14  17.0  Early     1
5     1   5.00     5         2     4  11.0  Early     1
6     1   6.00     0         3     7  10.0  Early     1
7     1   7.00     0         4     6  10.0  Early     1
8     1   8.00     0        13     6  19.0  Early     1
9     1   9.00     0         2     7   9.00 Early     1
10     1  10.0      0         2     3   5.00 Early     1

EDIT 2:

Here is a graph of what I'm trying to achieve with proportional sizing by sample size n per Trt (treatment), where the mean is calculated per Trt and Branch number. I'm wondering if I should make Branch a categorical variable.

Plot without Proportional Sizing


Solution

  • If I understood you correctly you would like to scale the size of points based on the number of points per Trt group.

    How about something like this? Note that I appended your sample data, because Trt contains only Early entries.

    df %>%
        group_by(Trt) %>%
        mutate(ssize = n()) %>%
        ggplot(aes(x = Branch, y = Flow_T, colour = Trt, size = ssize)) +
            geom_point();
    

    enter image description here

    Explanation: We group by Trt, then calculate the number of samples per group ssize, and plot with argument aes(...., size = ssize) to ensure that the size of points scale with sscale. You don't need the group aesthetic here.


    Update

    To scale points according to the mean of Flow_T per Trt we can do:

    df %>%
        group_by(Trt) %>%
        mutate(
            ssize = n(),
            mean.Flow_T = mean(Flow_T)) %>%
        ggplot(aes(x = Branch, y = Flow_T, colour = Trt, size = mean.Flow_T)) +
            geom_point();
    

    enter image description here


    Sample data

    # Sample data
    df <- read.table(text =
        "Plant Branch Pod_B Flow_Miss Pod_A Flow_T Trt     Dmg
    1     1   1.00     0        16    20  36.0  Early     1
    2     1   2.00     0         1    17  18.0  Early     1
    3     1   3.00     0         0    17  17.0  Early     1
    4     1   4.00     0         3    14  17.0  Early     1
    5     1   5.00     5         2     4  11.0  Early     1
    6     1   6.00     0         3     7  10.0  Early     1
    7     1   7.00     0         4     6  10.0  Early     1
    8     1   8.00     0        13     6  19.0  Early     1
    9     1   9.00     0         2     7   9.00 Early     1
    10     1  10.0      0         2     3   5.00 Early     1
    11     1  10.0      0         2     3   20.00 Late     1", header = T)