Search code examples
rggplot2labellinegraphgeom-text

Error with geom_text_repel when adding text labels to line graph from a different data set


Disclaimer: I found something similar to this problem in a different post but the solution is not quite what I need.

I have a data set, TGA, with a few time and temperature series involving different treatments.

>    TGA
# A tibble: 16,662 x 4
   `t [s]` `Ts [°C]` `Value [mg]` Treatment   
     <int> <chr>     <chr>        <chr>       
 1       0 28.686    68.9369      C_Water_Air,
 2       1 28.657    68.9368      C_Water_Air,
 3       2 28.688    68.937       C_Water_Air,
 4       3 28.751    68.9373      C_Water_Air,
 5       4 28.939    68.9377      C_Water_Air,
 6       5 29.123    68.9378      C_Water_Air,
 7       6 29.324    68.9381      C_Water_Air,
 8       7 29.51     68.9386      C_Water_Air,
 9       8 29.721    68.9379      C_Water_Air,
10       9 29.922    68.9341      C_Water_Air,
# ... with 16,652 more rows
    

I then plot this data as a geom_path to which I add some text labels from a different data set I have calculated from the TGA data set, called decar_cotton_air.

> decar_cotton_Air    
Groups:   Treatment [6]
      `t [s]` `Ts [°C]` `Value [mg]` Treatment       round_temp weight_difference reaction     
        <int>     <dbl>        <dbl> <chr>                <dbl>             <dbl> <chr>        
    1    2629      900.         65.7 C_Water_Air,           900             1.16  Decarbonation
    2    2629      900.         45.2 C_TSB_Air,             900             1.57  Decarbonation
    3    2630      900.         83.1 C_Sp1_Air,             900             0.972 Decarbonation
    4    2630      900.         84.8 C_Sh1_Air,             900             0.763 Decarbonation
    5    2629      900.         73.2 C_Positive_Air,        900             1.14  Decarbonation
    6    2630      900.         76.7 C_Open_Air,            900             3.90  Decarbonation

Essentially, I am using this second data set to label my graph with the treatment name and a value which is a difference in weight from x = 600 to x = 900. So far no problem.

ggplot(TGA, aes(`Ts [°C]`, `Value [mg]`, group = Treatment)) + 
      geom_path(aes(color = Treatment)) +
      labs(x = "Temperature [°C]", y = "Change in mass [mg]", title = "Thermogravimetric curve (TGA)", subtitle = "of lime mortar carbonated with Cotton") + 
      coord_cartesian(xlim = c(24, 950), ylim = c(45, 90)) +
      theme(legend.position = "none") + 
      geom_vline(xintercept = c(300,600,900), linetype = 3) +
      annotate("text", x = 450, y = 90, label = "dehydroxylation") + 
      annotate("text", x = 750, y = 90, label = "decarbonation") +
      geom_text_repel(aes(colour = Treatment), data= decar_cotton_Air, label = decar_cotton_Air$weight_difference, x = 750) +
      geom_text_repel(aes(colour = Treatment), data= decar_cotton_Air, label = decar_cotton_Air$Treatment, x = 100, nudge_y = 3)

Picture from TGA with labels from decar_cotton_Air

Now the problem is that when I try to run the same code with a different data set, I get the following code error:

Error in FUN(X[[i]], ...) : object 'mean_weight_loss' not found

The new data set is the following:

> TGA_averages
# A tibble: 16,662 x 3
# Groups:   t [s] [2,777]
   `t [s]` Treatment  mean_weight_loss
     <int> <chr>                 <dbl>
 1       0 C_open                100  
 2       0 C_positive            100  
 3       0 C_sh1                 100. 
 4       0 C_sp1                 100  
 5       0 C_tsb                 100  
 6       0 C_water               100  
 7       1 C_open                100  
 8       1 C_positive            100.0
 9       1 C_sh1                 100. 
10       1 C_sp1                 100.0
# ... with 16,652 more rows

In this code I have cleaned the Treatment labels and the mean_weight_loss is a conversion to percentage of the TGA$Value [mg] variable.

The code for the new plot is below. I changed the aes to fit the new dataset where x is time instead of temperature and y is the mean_weight_loss variable.

ggplot(TGA_averages, aes(`t [s]`, mean_weight_loss, group = Treatment)) + 
  geom_path(aes(color = Treatment)) +
  labs(x = "Time [s]", y = "Percentage change of mass [%]", title = "Thermogravimetric curve (TGA)", subtitle = "of lime mortars with Cotton") + 
  coord_cartesian(xlim = c(0, 3200), ylim = c(93.88025, 100)) +
  theme(legend.position = "none") +
  geom_vline(xintercept = c(300,600,900), linetype = 3) +
  annotate("text", x = 1200, y = 100, label = "dehydroxylation") + 
  annotate("text", x = 2100, y = 100, label = "decarbonation") +
  geom_text_repel(aes(colour = Treatment), data= decar_cotton_Air, label = decar_cotton_Air$weight_difference, x = 750) 
  

I have no idea how is this different to my previous code but I have searched everywhere and cannot solve this problem for the life of me. I know it is the geom_text_repel because it runs fine without it. NB: In this image I maned to insert labels using geom_dl(aes(label = Treatment, color = Treatment), method = list(dl.combine("last.qp"))) but it does not give me good results with the numbers as I want them inside the graph.

NB: In this image I maned to insert labels using   geom_dl(aes(label = Treatment, color = Treatment), method = list(dl.combine("last.qp")))


Solution

  • The issue is that geom_text_repel() requires an x and a y aesthetic. This is not a problem in the first example that works, since the y aesthetic is mapped to Value [mg], and that column exists in both the TGA and decar_cotton_Air.

    In the second example, you are plotting using TGA_averages and mapping y = mean_weight_loss. Since geom_text_repel() in that case is set to look at decar_cotton_Air, it will expect there to be a column for what is specified in x and y mapping... in this case, it expects there to be x and y columns specified.

    The solution is to either rename a column in decar_cotton_Air to be called mean_weight_loss, or specify the y mapping separately in each geom instead of overall in the plot. Here's some pseudocode to give you an idea:

    ggplot(TGA_averages, aes(x = `t [s]`, group = Treatment)) +  # only x and group aes
    
      # specify y aesthetic here
      geom_path(aes(y = mean_weight_loss, color = Treatment)) +
    
      labs(x = "Time [s]", y = "Percentage change of mass [%]", title = "Thermogravimetric curve (TGA)", subtitle = "of lime mortars with Cotton") + 
      coord_cartesian(xlim = c(0, 3200), ylim = c(93.88025, 100)) +
      theme(legend.position = "none") +
      geom_vline(xintercept = c(300,600,900), linetype = 3) +
      annotate("text", x = 1200, y = 100, label = "dehydroxylation") + 
      annotate("text", x = 2100, y = 100, label = "decarbonation") +
    
      # specify a different y aes here in decar_cotton_Air
      geom_text_repel(
        aes(colour = Treatment, y = `Value [mg]`),
        data= decar_cotton_Air,
        label = decar_cotton_Air$weight_difference, x = 750) 
    

    If you need the position of the labels to match the base data, I would recommend that you reference a different data frame other than decar_cotton_Air or merge the two datasets to ensure you have the common columns.