Search code examples
rscatter-plot

How can I modify this scatterplot to include a hierarchy based on a 3rd column of data?


I want to make a scatterplot of PM2.5 against life expectancy, within this i want 5 subcategories based on the GDP data (5 different colour plots and lines based on High to low GDP). How would i modify my current code to do this (or similar)? Code and data below, any help much appreciated.

plot = ggplot(dat6, aes(x=log(PM2.5), y= log(Lifeex))) +
  geom_point(colour = 'blue') +
  stat_smooth(method = "lm", col = "red") + 
  xlab("Life Expectancy") +
  ylab("Concentration of PM2.5") +
  ggtitle("Relationship between Life expectancy and PM2.5")



dat6
                 Country Life_Expectancy         GDP     PM2.5
1                Afghanistan        60.38333   1788.3152 53.933333
2                    Albania        77.03333  10642.3801 20.408333
3                    Algeria        75.16667  13674.2199 31.521667
4                     Angola        51.96667   6770.9149 37.346667
5        Antigua and Barbuda        75.98333  20893.5925 20.415000
6                  Argentina        75.93333  19838.7166 11.893333
7                    Armenia        74.26667   7728.3425 33.143333
8                  Australia        82.36667  43862.4894  7.338333
9                    Austria        84.00000  46586.1927 14.303333
10                Azerbaijan        72.00000  16804.9607 20.308333

Solution

  • Here is an example of what the question asks for.

    cut is used to create a new column GDP_Level based on a break points vector brks. The levels are assigned names, ranging from "Very Low" to "Very High".

    As for the plot I have removed the log transformations from the coordinates code and included then as transformations in both scale_*continuous instead.

    dat6 <- read.table(text = "
                     Country Life_Expectancy         GDP     PM2.5
    1                Afghanistan        60.38333   1788.3152 53.933333
    2                    Albania        77.03333  10642.3801 20.408333
    3                    Algeria        75.16667  13674.2199 31.521667
    4                     Angola        51.96667   6770.9149 37.346667
    5        'Antigua and Barbuda'        75.98333  20893.5925 20.415000
    6                  Argentina        75.93333  19838.7166 11.893333
    7                    Armenia        74.26667   7728.3425 33.143333
    8                  Australia        82.36667  43862.4894  7.338333
    9                    Austria        84.00000  46586.1927 14.303333
    10                Azerbaijan        72.00000  16804.9607 20.308333
    ", header = TRUE)
    
    library(ggplot2)
    
    brks <- c(0, 5000, 10000, 20000, 40000, Inf)
    dat6$GDP_Level <- cut(dat6$GDP, breaks = brks, labels = c("Very Low", "Low", "Medium", "High", "Very High"))
    
    ggplot(dat6, aes(x = PM2.5, y = Life_Expectancy, color = GDP_Level)) +
      geom_point(colour = 'blue') +
      stat_smooth(formula = y ~ x, method = "lm", col = "red") + 
      xlab("Life Expectancy") +
      ylab("Concentration of PM2.5") +
      scale_x_continuous(trans = "log") +
      scale_y_continuous(trans = "log") +
      ggtitle("Relationship between Life expectancy and PM2.5")
    

    Created on 2022-02-21 by the reprex package (v2.0.1)