Search code examples
rggplot2boxplotjitter

how to color data points on a box plot by certain observations of a group in R


i'm trying to add colors to my data points on my boxplot based on a certain group but only a subset of observations within that group. For example:

I have a data set looks kind of like this

Value   Make   Location
40      Honda   USA
50      Toyota  CHINA
60      Ford    FRANCE
70      Subaru  CHINA
50      Honda   BRAZIL
50      Toyota  SPAIN
30      Nissan  CANADA

i'm plotting a box plot looking at the value variable on the y axis and make variable on the x axis. Then i want to add all the data points to my boxplot and color only the ones where the location = china, brazil, and spain (all different colors) while the other data points that are not china, brazil, and spain would just be black.

This is my code:

library(ggplot2)

data %>% ggplot(aes(x=Make, y=Value)) +
         geom_boxplot() +
         geom_jitter(aes(color=Location)) 

but this colors all the data points based on the location variable. i need the points colored only when location is china, brazil, and spain but still showing all the data points. how could i achieve this? any suggestions would be greatly appreciated!


Solution

  • You can just create an extra column that says "Others" if it doesn't have the countries you want. Then, use scale_color_manual to enter the colors you want.

    library(data.table)
    df <- fread('Value   Make   Location
    40      Honda   USA
    50      Toyota  CHINA
    60      Ford    FRANCE
    70      Subaru  CHINA
    50      Honda   BRAZIL
    50      Toyota  SPAIN
    30      Nissan  CANADA')
    
    library(dplyr)
    #> 
    #> Attaching package: 'dplyr'
    #> The following objects are masked from 'package:data.table':
    #> 
    #>     between, first, last
    #> The following objects are masked from 'package:stats':
    #> 
    #>     filter, lag
    #> The following objects are masked from 'package:base':
    #> 
    #>     intersect, setdiff, setequal, union
    library(ggplot2)
    visible_ctr <- c('BRAZIL','CHINA','SPAIN')
    df %>% 
      mutate(lcat=if_else(Location %in% visible_ctr,Location,'Others')) %>% 
      mutate(lcat=factor(lcat,levels = c(visible_ctr,'Others'))) %>% # Reorder legend
      ggplot(aes(x=Make, y=Value)) +
      geom_boxplot() +
      geom_jitter(aes(color=lcat)) +
      scale_color_manual(values = c(BRAZIL='blue',CHINA='red',SPAIN='yellow',Others='black'))
    

    Created on 2020-02-20 by the reprex package (v0.3.0)