Search code examples
rggplot2dplyrsubset

ggplot selection: should I use 'subset' or 'ifelse' to filter data?


I'm trying to draw two density plots together of two different variables from 2 different datasets.

My datasets are something like these:

dataset1

Real Wage 1    PPA
1244           105
1577           90
1865           105
1756           105
1634           90
1273           90
2719           105
...            ....

dataset2

Real Wage 2    PPA
1233           105
1588           90
1265           105
1743           105
1224           90
1983           90
2449           105
...            ....

And this is my script

ggplot() + 
  geom_density( aes( x = dataset1$`Real Wage 1`), fill = "red",  alpha = 0.5)+
  geom_density( aes( x = dataset2$`Real Wage 2`), fill = "blue", alpha = 0.5)+
  theme_classic()

It works great, but now I want to plot the observations for Real Wage 1 and Real Wage 2 according with specific values of PPA

Of course I cannot use the filter function because i'm working with two different datasets. Therefore I've tried to subset each variable

ggplot() + 
  geom_density( aes( x = subset(dataset1$`Real Wage 1`, PPA == 105)), fill = "red",  alpha = 0.5)+
  geom_density( aes( x = subset(dataset2$`Real Wage 2`, PPA ==105)), fill = "blue", alpha = 0.5)+
  theme_classic()

But it doesn't work because ( I suppose ..) specifing the variable with $ i'm already excluding all the others and hence the subset doesn't find the PPA to apply the logical condition.

I know that it is possible to filter data using the function ifelse, but until now everytime i've tried to use it, it hasn't worked ( probably because I'm not able to apply it).

Can anyone help me?


Solution

  • ggplot expects a dataframe to work with, rather than a vector, and you are also using the subset function incorrectly.

    The way to do this would be to use the following form in your geom_density terms...

    + geom_density(data = subset(dataset1, PPA==105), aes(x = `Real Wage 1`), ...)