Search code examples
rggplot2colorsgeom-bar

change the color of 20% of bars in geom_bar ggplot


I'm trying to change the color of 9 states in the following image. Those states are top mining states and I want them to stand out in the image attached below. I probably need to modify my dataframe as the easiest step. But any other ideas?

ggplot(data = media_impact_by_state) +
  #geom_hline(yintercept=0,linetype="dashed", color = "red") +
  geom_bar(aes(x= reorder(GeoName,trustclimsciSSTOppose - mean(trustclimsciSSTOppose)), 
               y= CO2limitsOppose-mean(CO2limitsOppose), fill = "fill1"),
           stat = 'identity') +
  geom_point(aes(x = GeoName,  
                 y = trustclimsciSSTOppose - mean(trustclimsciSSTOppose),
                color = "dot1"),
                 size=3) +
  scale_color_manual(values = c("black"),
                     label = "Distrust of Scientists",
                     name = "Mean Deviation") +
  scale_fill_manual(values = c(fill1 = "darkorange1",fill2 = "blue"),
                    labels = c(fill1 = "Oppose Limits to Co2 Emissions",fill2 = "poop"),
                    name = "Mean Deviation") +
  labs(x = "State",
       y = "(%)",
       title = "Distrust of Scientists") +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1,size=12),
        axis.text.y = element_text(size=14),
        axis.title.y = element_text(size=16),
        axis.title.x = element_text(size=16),
        plot.title = element_text(size=16,hjust=0.5))

enter image description here


Solution

  • It will be difficult to offer guidance without seeing a subset of your data. To offer some suggestions, try amending the appropriate column(s) (i.e., variables) using ifelse() before feeding it to the fill aesthetic. Make sure this is wrapped inside of the aes() call. Your legend titled "Mean Deviation" should appropriately split into two categories. Then, simply amend the colors inside of scale_fill_manual() as needed.

    ggplot(data = media_impact_by_state) +
      geom_bar(aes(x = reorder(GeoName, trustclimsciSSTOppose - mean(trustclimsciSSTOppose)), 
                   y = CO2limitsOppose - mean(CO2limitsOppose), 
                   fill = factor(ifelse(GeoName %in% c(...), "Top 20", "Bottom 80"))),  # index the states
               stat = 'identity') +
      geom_point(aes(x = GeoName,  
                     y = trustclimsciSSTOppose - mean(trustclimsciSSTOppose),
                     color = "dot1"),
                 size = 3) +
      scale_color_manual(name = "Mean Deviation"
                         values = c("black"),
                         labels = "Distrust of Scientists") +
      scale_fill_manual(name = "Mean Deviation", 
                        values = c("darkorange1",  # supply the vector of colors
                                   "blue"),
                        labels = c("Oppose (Top 20)",  # supply the vector of labels
                                   "Oppose (Bottom 80)") +
      labs(x = "State",
           y = "(%)",
           title = "Distrust of Scientists") +
      theme(
        axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1, size = 12),
        axis.text.y = element_text(size = 14),
        axis.title.y = element_text(size = 16),
        axis.title.x = element_text(size = 16),
        plot.title = element_text(size = 16, hjust = 0.5)
        )
    

    However, if you want to flag the top 20 percent of states by any other arbitrary measures of mining output, then maybe you should consider modifying the existing data frame using one of R's generic functions. I'm not sure by what standard(s) you are using to determine the "top" mining states, but that is for you to decide. For example, try creating a variable ahead of time, call it fill_col and pass it to fill inside of the aes() call. Here is how you could pre-process the data:

    media_impact_by_state %>% 
      arrange(GeoName, desc(mining_output)) %>%  # order in descending order by mining output
      mutate(fill_col = mining_output > quantile(mining_output, .8))  # flag the top 20 percent
    

    In the end, there's nothing wrong with manually typing in all the states that you want to highlight, though it is harder on the eyes and could become unwieldy if you had more than 50 states (or 51 if you included the District of Columbia).

    I hope this helps!