Search code examples
rif-statementplotkernel-density

R plot kernel densities for subsets of a dataset (conditional on char variable)


I want to plot kernel densities for specific subsets of my dataset. The subsets are to be identified by a char variable. My dataset has the following structure (not my actual dataset but the general structure):

Char_var    var1   var2  var3  k_var
Material A                      2
Material B                      5
Material C                      7
Material A                      8
Material C                      4
.                               .
.                               .
.                               .

var1, var2, var3 are other doubles but not necessary for this plot.

Generally, I coded it so far like this

dens1 <-  density(k_var) # How do I add an if statement for the Char_var here?
plot(dens1)

If I do it this way, I would need to write the code above for every material in my dataset. Is there a more elegant way to code that such that I get the density plots for every material or do I need to split it up for every material as I intended to do? ...as I have acutally more than three materials in my dataset. Thanks!


Solution

  • To get the densities a simple way is to use a tapply loop.

    dens <- tapply(dat$k_var, dat$char_var, density)
    

    Now the plots. These densities are all plotted in the same graph and should be seen as just an example.

    dx <- sapply(dens, function(d) range(d$x))
    dy <- sapply(dens, function(d) range(d$y))
    
    xlim <- c(min(dx[1, ]), max(dx[2, ]))
    ylim <- c(min(dy[1, ]), max(dy[2, ]))
    
    plot(0, type = "n", xlim = xlim, ylim = ylim, xlab = "", ylab = "")
    for(i in seq_along(dens)){
      par(new = TRUE)
      plot(dens[[i]], main = "", col = i, xlab = "", xlim = xlim, ylim = ylim)
    }
    

    enter image description here

    Data creation code.

    set.seed(1234)
    dat <- data.frame(char_var = rep(LETTERS[1:4], each = 10),
                      k_var = rnorm(40))