I'm trying to plot boxplots about wage according with the area.
This is a sample of my dataset ( It is provided by a research institute)
> head(final2, 20)
nquest nord ireg staciv etalav acontrib nome_reg tpens pesofit
1 173 1 18 3 25 35 Calabria 1800 0.3801668
2 2886 1 13 1 26 35 Abruzzo 1211 0.2383701
3 2886 2 13 1 20 42 Abruzzo 2100 0.2383701
4 5416 1 8 3 16 30 Emilia Romagna 700 0.8819879
5 7886 1 9 1 22 35 Toscana 2000 1.2452078
6 20297 1 5 1 14 39 Veneto 1200 1.6694498
7 20711 2 4 1 15 37 Trentino 2000 3.3746801
8 22169 1 15 4 40 5 Campania 600 1.6875562
9 22276 1 8 2 18 37 Emilia Romagna 1200 2.1782894
10 22286 1 8 1 15 19 Emilia Romagna 850 3.0333999
11 22286 2 8 1 15 35 Emilia Romagna 650 3.0333999
12 22657 1 16 1 25 40 Puglie 1400 0.3616937
13 22657 2 16 1 26 36 Puglie 1500 0.3616937
14 23490 1 5 2 23 36 Veneto 1400 0.9763965
15 24147 1 4 1 26 35 Trentino 1730 1.2479984
16 24147 2 4 1 18 45 Trentino 1600 1.2479984
17 24853 1 11 1 18 38 Marche 2180 0.3475683
18 27238 1 12 1 16 31 Lazio 1050 3.6358952
19 27730 1 20 1 15 37 Sardegna 1470 0.7232677
20 27734 1 20 1 16 45 Sardegna 1159 0.6959107
The variables:
nquest
= is the code of the familynord
= is the component of the familynome_reg
= is the area where they livetpens
= is the wage that each one of them earnpesofit
= is the weight for each observationThis is the code I'm using
final2 %>%
filter(nome_reg == "Piemonte"|
nome_reg == "Valle D'Aosta" |
nome_reg == "Lombardia" |
nome_reg == "Liguria"
) %>%
ggplot(aes( x = factor(nome_reg,
levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")),
y = tpens , fill = nome_reg ))+
geom_boxplot(varwidth = TRUE)
Which gives me this plot
Is there a way to plot a weighted boxplot?? I mean a boxplot that takes into account the weights
for each observation ( in this case the wage tpens
for each individual in each area)?
I'm already performing a weighted regression, hence I would like to visualize the weighted data
I've tried weight = pesofit
in aes
final2 %>%
filter(nome_reg == "Piemonte"|
nome_reg == "Valle D'Aosta" |
nome_reg == "Lombardia" |
nome_reg == "Liguria") %>%
ggplot(aes( x = factor(nome_reg, levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")),
y = tpens , fill = nome_reg, weight = pesofit ))+
geom_boxplot(varwidth = TRUE)
but R answers
Warning message:
The following aesthetics were dropped during statistical transformation: weight
i This can happen when ggplot fails to infer the correct grouping structure in the data.
i Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
How can I solve??
Based on a simple example, it seems that specifying the weights does what's expected, despite the warning, see the following simple example of how the weights affect the plot:
set.seed(0)
tmp <- data.frame(x=rnorm(100)) #Some random data to plot
tmp$y <- ifelse(tmp$x>0, 1, 0.1) #weight positive values highly
ggplot(tmp, aes(x=x)) + geom_boxplot()
ggplot(tmp, aes(x=x, weight=y)) + geom_boxplot()
#Warning message:
#The following aesthetics were dropped during statistical transformation: weight
#ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
#ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
It seems like the warning may be spurious, possibly related to this bug