Search code examples
rggplot2geom

Dots disconnected from lines when using geom_path and geom_point .Fixed but I get No summary function supplied, defaulting to `mean_se()


I have the following data:

structure(list(Expo = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L), .Label = c("DC", "DI"), class = "factor"), Quail = c(5L, 
6L, 16L, 17L, 28L, 29L, 30L, 53L, 54L, 11L, 12L, 46L, 48L, 60L, 
11L, 48L, 6L, 5L, 6L, 18L, 29L, 30L, 53L, 11L, 36L, 46L, 47L, 
60L, 11L, 4L, 5L, 6L, 16L, 17L, 28L, 29L, 30L, 52L, 53L, 54L), 
    Segment = c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Position = c(1949L, 
    1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 
    1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 
    1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 
    1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 1949L, 
    1949L, 1949L, 1949L), Freq = c(0.034496, 0.034845, 0.031079, 
    0.020761, 0.037311, 0.047204, 0.062257, 0.100617, 0.022637, 
    0.587758, 0.470607, 0.037855, 0.02897, 0.034457, 0.87815, 
    0.022788, 0.169897, 0.058831, 0.116039, 0.032077, 0.081132, 
    0.09126, 0.051852, 0.896703, 0.09873, 0.054908, 0.027505, 
    0.50293, 0.975181, 0.03713, 0.092243, 0.028103, 0.044125, 
    0.057707, 0.091152, 0.085498, 0.130286, 0.030099, 0.049717, 
    0.070069), day = c(3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 
    3L, 3L, 3L, 3L, 7L, 7L, 7L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 
    5L, 1L, 1L, 8L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L
    )), row.names = c(NA, -40L), class = "data.frame")

When I run

ggplot(Expo.Shared.PB1, aes(x=as.numeric(day), y=Freq, color = as.character(Quail))) +
    geom_path()+
    geom_point() +
facet_grid(Expo~.)

It gives me the following (incorrect) plot.

enter image description here

I solved this adding stat="summary" to both geoms but on doing so it gives me the following message: No summary function supplied, defaulting to `mean_se()

ggplot(Expo.Shared.PB1, aes(x=as.numeric(day), y=Freq, color = as.character(Quail))) +
  geom_path(stat = "summary")+
  geom_point(stat = "summary") +
  facet_grid(Expo~.)

The output:

enter image description here

The plot seems what I am looking for, now:

what is stat="summary" really doing? Are the values plotted modified from the original values? is it ok to overlook the message? (I am sure is not).


Solution

  • OK. I think I see the issue. [ Having the "correct" graph to compare with the "incorrect" one was helpful! :) ]

    geom_path simply "joins the dots". It takes the points in the dataset and joins them in the order in which they appear. My first thought was that your dataset isn't sorted as you expect. So, taking Quail == 11 as an example:

    Expo.Shared.PB1 %>% filter(Quail == 11)
    # A tibble: 4 x 6
      Expo  Quail Segment Position  Freq   day
      <fct> <int>   <int>    <int> <dbl> <int>
    1 DC       11       2     1949 0.588     3
    2 DC       11       2     1949 0.878     7
    3 DC       11       2     1949 0.897     5
    4 DC       11       2     1949 0.975     8
    

    And indeed, that is the case. So the solution is simple. Sort the data into the order you want before plotting:

    Expo.Shared.PB1 %>% 
      arrange(Quail, day) %>% 
      ggplot(aes(x=as.numeric(day), y=Freq, color = as.character(Quail))) +
        geom_path()+
        geom_point() +
        facet_grid(Expo~.)
    

    enter image description here

    which I think is what you want, without the need to use stat="summary".

    So, why did stat="summary" give you what you wanted, albeit with a warning? I'm guessing here, but this is my theory. stat="summary" presents an arbitrary summary of y-values grouped by x-values. To do so, it must, logically, calculate summary statistics for subsets of the y-values. The obvious way of doing so would be to use group_by. Now, it is not at all obvious whether group_by sorts the data when forming the groups. My guess is that, in this case, it does sort. So you get the ordering you want as an accidental by-product of the call to stat="summary".

    PS: to get the values of Quail to appear in numerical rather than lexicographical order, use color=as.factor(Quail) (and use scale_color_discrete(name="Quail") to adjust the legend title if necessary).