Search code examples
rggplot2r-factor

add factor levels that are not in use


I've a simple question that I cannot solve: I want to plot a data.frame (a month) with factors, where sometimes levels are missing. R attributes then only the existing levels, so my plots differ if there are one, two ore more levels present.

Here an example:

    library(ggplot2)
    library(reshape2)

f             <- factor(c("Free", "Work"))
mon           <- as.data.frame(matrix(as.factor(rep(f[2], times = 8)), nrow = 4)) 
colnames(mon) <- c("A", "B")

mt    <- t(as.matrix(rev(data.frame(as.matrix(mon))))) #  change order of y
m     <- melt(mt)

col   <- c("azure",  "orange")

ggplot(m, aes(x = Var2, y = Var1, fill = value)) +
  geom_tile(colour="grey10") +
  scale_fill_manual(values = col, labels = f, name = NULL) +
  theme(panel.background = element_rect(fill = "white"), axis.ticks = element_blank()) +
  theme(axis.title.x = element_blank(), axis.title.y = element_blank()) 

As one can see, I attribute the second element of 2 factors, "Work" to the elements, but it plots "Free". What is disturbing, is that the factors of mon have only 1 level in place of 2 possible levels. It gives another plot if I attribute several levels to the mon:

mon   <- as.data.frame(matrix(as.factor(rep(c(f[1], f[2]), times = 4)), nrow = 4))

.. and re-running the plot obove. It is also not possible to assign another level, even if it was a choice from originally 2 levels:

mon[1,1] <- f[1]

I tried a lot with levels, relevel, order etc. without success. Does anyone have an idea?


Solution

  • Matrices can't hold factors. When you put a factor in a matrix, it gets coerced to character, and the unused levels are lost. as.data.frame(matrix(...))) is a bad habit for this (and other class conversion) reasons.

    Here's a way to replicate your data transformations as near as I can follow them without losing factor levels:

    f <- factor(c("Free", "Work"))
    x= rep(f[2], 4)
    mon <- data.frame(A = x, B = x)
    str(mon)
    # 'data.frame': 4 obs. of  2 variables:
    #  $ A: Factor w/ 2 levels "Free","Work": 2 2 2 2
    #  $ B: Factor w/ 2 levels "Free","Work": 2 2 2 2
    ## looks good
    
    # What is y? What's the point?
    #mt    <- t(as.matrix(rev(data.frame(as.matrix(mon))))) #  change order of y
    
    mon$id = 1:nrow(mon)
    m     <- reshape2::melt(mon, id.vars = "id", factorsAsStrings = FALSE)
    
    levels(m$value)
    # [1] "Free" "Work"
    ## looks good
    

    Now, when we get to plotting, specify drop = FALSE in the scale to include unused levels in the legend. (Use the default drop = TRUE if you don't want the unused levels showing up.) Since the levels are already there, we don't need to customize the labels.

    col   <- c("azure",  "orange")
    
    ggplot(m, aes(x = id, y = variable, fill = value)) +
      geom_tile(colour="grey10") +
      scale_fill_manual(values = col, name = NULL, drop = FALSE) +
      theme(panel.background = element_rect(fill = "white"), axis.ticks = element_blank()) +
      theme(axis.title.x = element_blank(), axis.title.y = element_blank()) 
    

    enter image description here

    If you want to be extra safe with the color scale, you can add names to the values vector before putting it in the scale:

    names(col) = levels(f)
    

    Another way to get the data would be to not worry about the levels during transformation, and re-factor with appropriate levels at the end:

    # your original code:
    f             <- factor(c("Free", "Work"))
    mon           <- as.data.frame(matrix(as.factor(rep(f[2], times = 8)), nrow = 4)) 
    colnames(mon) <- c("A", "B")
    
    mt    <- t(as.matrix(rev(data.frame(as.matrix(mon))))) #  change order of y
    m     <- melt(mt)
    
    # add this at the end
    m$value = factor(m$value, levels = levels(f))
    
    # check that it looks good:
    str(m$value)
    # Factor w/ 2 levels "Free","Work": 2 2 2 2 2 2 2 2