Search code examples
rggplot2dplyrboxplotweighted-average

ggplot & boxplot: is it possible to add weights?


I'm trying to plot boxplots about wage according with the area.

This is a sample of my dataset ( It is provided by a research institute)

> head(final2, 20)
   nquest nord ireg staciv etalav acontrib       nome_reg tpens   pesofit
1     173    1   18      3     25       35       Calabria  1800 0.3801668
2    2886    1   13      1     26       35        Abruzzo  1211 0.2383701
3    2886    2   13      1     20       42        Abruzzo  2100 0.2383701
4    5416    1    8      3     16       30 Emilia Romagna   700 0.8819879
5    7886    1    9      1     22       35        Toscana  2000 1.2452078
6   20297    1    5      1     14       39         Veneto  1200 1.6694498
7   20711    2    4      1     15       37       Trentino  2000 3.3746801
8   22169    1   15      4     40        5       Campania   600 1.6875562
9   22276    1    8      2     18       37 Emilia Romagna  1200 2.1782894
10  22286    1    8      1     15       19 Emilia Romagna   850 3.0333999
11  22286    2    8      1     15       35 Emilia Romagna   650 3.0333999
12  22657    1   16      1     25       40         Puglie  1400 0.3616937
13  22657    2   16      1     26       36         Puglie  1500 0.3616937
14  23490    1    5      2     23       36         Veneto  1400 0.9763965
15  24147    1    4      1     26       35       Trentino  1730 1.2479984
16  24147    2    4      1     18       45       Trentino  1600 1.2479984
17  24853    1   11      1     18       38         Marche  2180 0.3475683
18  27238    1   12      1     16       31          Lazio  1050 3.6358952
19  27730    1   20      1     15       37       Sardegna  1470 0.7232677
20  27734    1   20      1     16       45       Sardegna  1159 0.6959107

The variables:

  1. nquest = is the code of the family
  2. nord = is the component of the family
  3. nome_reg = is the area where they live
  4. tpens = is the wage that each one of them earn
  5. pesofit = is the weight for each observation

This is the code I'm using

final2 %>%
  filter(nome_reg == "Piemonte"| 
         nome_reg == "Valle D'Aosta" | 
         nome_reg == "Lombardia" | 
         nome_reg == "Liguria"
        ) %>%
          ggplot(aes( x = factor(nome_reg, 
                      levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")), 
                      y = tpens , fill = nome_reg ))+
  geom_boxplot(varwidth = TRUE) 

Which gives me this plot

BOXPLOT

Is there a way to plot a weighted boxplot?? I mean a boxplot that takes into account the weights for each observation ( in this case the wage tpens for each individual in each area)?

I'm already performing a weighted regression, hence I would like to visualize the weighted data

I've tried weight = pesofit in aes

 final2 %>%
   filter(nome_reg == "Piemonte"| 
           nome_reg == "Valle D'Aosta" | 
           nome_reg == "Lombardia" | 
           nome_reg == "Liguria") %>%
   ggplot(aes( x = factor(nome_reg, levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")), 
               y = tpens , fill = nome_reg, weight = pesofit ))+
   geom_boxplot(varwidth = TRUE)

but R answers

Warning message:
The following aesthetics were dropped during statistical transformation: weight
i This can happen when ggplot fails to infer the correct grouping structure in the data.
i Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?

How can I solve??


Solution

  • Based on a simple example, it seems that specifying the weights does what's expected, despite the warning, see the following simple example of how the weights affect the plot:

    set.seed(0)
    tmp <- data.frame(x=rnorm(100))   #Some random data to plot
    tmp$y <- ifelse(tmp$x>0, 1, 0.1)  #weight positive values highly
    
    ggplot(tmp, aes(x=x)) + geom_boxplot()    
    

    Output without weights

    ggplot(tmp, aes(x=x, weight=y)) + geom_boxplot()
    #Warning message:
    #The following aesthetics were dropped during statistical transformation: weight
    #ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
    #ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor? 
    

    output with weights

    It seems like the warning may be spurious, possibly related to this bug