Search code examples
rggplot2geom-texterrorbar

R how to prevent ggplot geom_text() from using new database data on a named plot object


I am attempting to make a series of plots using the same code with unique coral species databases.

Databases

data_1 <- structure(list(Site_long = structure(c(1L, 1L, 2L, 2L), .Label = c("Hanauma Bay", 
"Waikiki"), class = "factor"), Shelter = structure(c(1L, 2L, 
1L, 2L), .Label = c("Low", "High"), class = c("ordered", "factor"
)), mean = c(1.19986885018767, 2.15593884020962, 0.369605100791602, 
0.31005865611133), sd = c(2.5618758944073, 3.67786619671933, 
1.0285671157698, 0.674643037178562), lower = c(0.631321215232725, 
1.33972360808602, 0.141339007832154, 0.160337623931733), upper = c(1.76841648514261, 
2.97215407233321, 0.59787119375105, 0.459779688290928), sample_size = c(78L, 
78L, 78L, 78L)), row.names = c(NA, -4L), groups = structure(list(
    Site_long = structure(1:2, .Label = c("Hanauma Bay", "Waikiki"
    ), class = "factor"), .rows = structure(list(1:2, 3:4), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

data_2 <- structure(list(Site_long = structure(c(2L, 2L, 1L, 1L), .Label = c("Hanauma Bay", 
"Waikiki"), class = "factor"), Shelter = structure(c(1L, 2L, 
1L, 2L), .Label = c("Low", "High"), class = c("ordered", "factor"
)), mean = c(0.695203162997812, 0.838720069947102, 0.76957780057238, 
0.771070502382599), sd = c(1.17117437618039, 1.02766824928792, 
1.43499288333539, 1.28634022958585), lower = c(0.435288768568787, 
0.610653459098997, 0.451115141323908, 0.485597776371556), upper = c(0.955117557426838, 
1.06678668079521, 1.08804045982085, 1.05654322839364), sample_size = c(78L, 
78L, 78L, 78L)), row.names = c(NA, -4L), groups = structure(list(
    Site_long = structure(1:2, .Label = c("Hanauma Bay", "Waikiki"
    ), class = "factor"), .rows = structure(list(3:4, 1:2), ptype = integer(0), class = c("vctrs_list_of", 
    "vctrs_vctr", "list"))), row.names = 1:2, class = c("tbl_df", 
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df", 
"tbl_df", "tbl", "data.frame"))

When I run my code on the first species database (data_1), the barplots and associated error bar annotations render correctly. Notice I also made a new variable "data" that will be the same object used in later for species 2. In order to keep this plot to make a composite of a number of plots later, I named the plot "species_1_plot" to save it to the global environment.

Code for Species 1 Plot

data <- data_1

mult_compare_recruitment <- c("A", "A", "A", "A")

data <- data[c(3, 4, 1, 2),]
data$Shelter <- factor(data$Shelter, levels = c("Low", "High"))
# reorder summary dataframe for plotting 

position <- c("Waikiki", "Hanauma Bay")
# ggplot2 barplot position with Waikiki (Low-High Shelter) and Hanauma Bay 

recruitment_plot_3 <- ggplot(data = data, aes(fill=Shelter, y=mean, x=Site_long)) + 
  geom_bar(position = "dodge", stat="identity", width = .8) +
  scale_x_discrete(limits = position) +
  geom_errorbar(aes(ymin = lower, ymax = upper), position = position_dodge(.8), width = .1) +
  geom_text(aes(label = mult_compare_recruitment, y = data$upper), vjust = -.5, position = position_dodge(width = 0.8), size = 4) +
  scale_fill_grey(name = "Shelter", start = .8, end = .2) +
  labs(x = "Site", y = expression(paste("Coral recruitment per m"^"2"))) + 
  theme_classic(base_size = 14.5) +
  theme(text = element_text(size = 18), axis.title.x = element_blank(),
        legend.position = "none", axis.text.y = element_text(angle = 90))

species_1_plot <- recruitment_plot_3

species_1_plot

enter image description here

In order to create my next plot, I run the same code on a different species database (data_2) while once again assigning the new database to the object "data". Once again, I saved the new plot "species_2_plot" to the global environment.

Code for Species 2 Plot

data <- data_2

mult_compare_recruitment <- c("A", "A", "B", "B")

data <- data[c(3, 4, 1, 2),]
data$Shelter <- factor(data$Shelter, levels = c("Low", "High"))
# reorder summary dataframe for plotting 

position <- c("Waikiki", "Hanauma Bay")
# ggplot2 barplot position with Waikiki (Low-High Shelter) and Hanauma Bay 

recruitment_plot_3 <- ggplot(data = data, aes(fill=Shelter, y=mean, x=Site_long)) + 
  geom_bar(position = "dodge", stat="identity", width = .8) +
  scale_x_discrete(limits = position) +
  geom_errorbar(aes(ymin = lower, ymax = upper), position = position_dodge(.8), width = .1) +
  geom_text(aes(label = mult_compare_recruitment, y = data$upper), vjust = -.5, position = position_dodge(width = 0.8), size = 4) +
  scale_fill_grey(name = "Shelter", start = .8, end = .2) +
  labs(x = "Site", y = expression(paste("Coral recruitment per m"^"2"))) + 
  theme_classic(base_size = 14.5) +
  theme(text = element_text(size = 18), axis.title.x = element_blank(),
        legend.position = "none", axis.text.y = element_text(angle = 90))

species_2_plot <- recruitment_plot_3

species_2_plot

enter image description here

The problem is, when I plot the first species plot again (species_1_plot), the data are correct (bars), but the height of text annotations and their letter values are not correct. They are in fact the values from species_2_plot.

species_1_plot

enter image description here

I saved each plot to the global environment with a unique name knowing this would be an issue. But despite this, geom_text() seems to be using data from the second plot (code that is in the global environment) instead despite that the actual data (bars) in the plot are correct (from species_plot_1). My understanding was that when you name a plot as an object (species_1_plot and species_2_plot) that its akin to saving the plot and therefore preventing any changes to plot and annotations unless specified. Is there any way to prevent this from happening without specifically naming the databases (data_1 and data_2)? All input is appreciated. Thanks in advance!


Solution

  • I would suggest you to use an approach with a function. The fact of using data twice is maybe changing the environment and as a result the plots change. I have made a function with parameters for data, position and recruitment and I display the outputs. You have to fill them in the same way you defined that variables in your code. Functions work on internal environments so there might not be issues about how data is processed. Here the code where I used the data you shared:

    library(ggplot2)
    #Function
    myplotfunc <- function(x,y,z)
    {
        data <- x
        
        mult_compare_recruitment <- y
        
        data <- data[c(3, 4, 1, 2),]
        data$Shelter <- factor(data$Shelter, levels = c("Low", "High"))
        # reorder summary dataframe for plotting 
        
        position <- z
        # ggplot2 barplot position with Waikiki (Low-High Shelter) and Hanauma Bay 
        
        plot <- ggplot(data = data, aes(fill=Shelter, y=mean, x=Site_long)) + 
            geom_bar(position = "dodge", stat="identity", width = .8) +
            scale_x_discrete(limits = position) +
            geom_errorbar(aes(ymin = lower, ymax = upper), position = position_dodge(.8), width = .1) +
            geom_text(aes(label = mult_compare_recruitment, y = data$upper), vjust = -.5, position = position_dodge(width = 0.8), size = 4) +
            scale_fill_grey(name = "Shelter", start = .8, end = .2) +
            labs(x = "Site", y = expression(paste("Coral recruitment per m"^"2"))) + 
            theme_classic(base_size = 14.5) +
            theme(text = element_text(size = 18), axis.title.x = element_blank(),
                  legend.position = "none", axis.text.y = element_text(angle = 90))
        return(plot)
    }
    #Code
    o1 <- myplotfunc(x=data_1,y=c("A", "A", "A", "A"),z=c("Waikiki", "Hanauma Bay"))
    o2 <- myplotfunc(x=data_2,y=c("A", "A", "B", "B"),z=c("Waikiki", "Hanauma Bay"))
    

    Outputs:

    enter image description here

    enter image description here