How to get two likert variables properly into one (ggplot2) sj.likert plot of the sjPlot package?

I have a dataframe with two likert variables. I want to plot these two variables by using the sjp.likert function of the sjPlot package. The plot doesn't make sense.

My data (mydf) looks like this:

structure(list(var1 = c(1, 1, 5, NA, 3, NA, 1, NA, 4, 3, 5, 5, 
4, 2, 2, NA, NA, 5, NA, NA), var2 = c(NA, NA, NA, 3, NA, 3, NA, 
5, NA, NA, NA, 2, NA, NA, NA, 4, 4, NA, 1, 1)), .Names = c("var1", 
"var2"), row.names = c(NA, 20L), class = "data.frame")

   var1 var2
1     1   NA
2     1   NA
3     5   NA
4    NA    3
5     3   NA
6    NA    3
7     1   NA
8    NA    5
9     4   NA
10    3   NA
11    5   NA
12    5    2
13    4   NA
14    2   NA
15    2   NA
16   NA    4
17   NA    4
18    5   NA
19   NA    1
20   NA    1

This is the code I use:

library(sjPlot)
library(RColorBrewer)

likert_5 <- mydf
levels_5 <- list(c(1,2,3,4,5))
varnames <- names(likert_5
sjp.likert(likert_5, legendLabels=levels_5, barColor="brewer",legendSize=0.5,axisLabelSize=0.5,valueLabelSize=2,colorPalette="BrBG", orderBy="pos",legendPos="bottom",axisLabels.y=varnames)

This is the result:

enter image description here

I think you agree that this doesn't make sense at all. The two variable names are the same and there are four levels instead of five. Does anyone know what's going wrong here?

Many thanks in advance!

Solution

I believe this is a bug in the sjp.likert function. Adding arguments one by one, I found that the plot works fine until the argument orderBy = "pos"is included. Examining the source code of the function shows:

sjp.likert
# ...
# questionCount <- nrow(pos)/(length(legendLabels)/2)
# if (!is.null(orderBy)) {
#   ...
#   orderUniqueItems <- rev(1 + questionCount - orderUniqueItems)
#   axisLabels.y <- axisLabels.y[orderUniqueItems]
# }
# ...

Using your data, I end up with the following:

questionCount
# [1] 1.6
orderUniqueItems
# [1] 1.6 0.6
varnames[c(1.6, 0.6)]
# [1] "var1"

I think the author actually wanted questionCount <-ceiling(orderUniqueItems <- c(unique(orderRelatedItems))), which with your data would produce:

questionCount
# [1] 2
orderUniqueItems
# [1] 2 1
varnames[c(1.6, 0.6)]
# [1] "var2" "var1"

A quick fix would be to save the returned plot and modify the labels manually (using the author's code to create the labels with 'n=' pasted on).

for (i in 1:length(varnames)) {
  varnames[i] <- paste(varnames[i], sprintf(" (n=%i)", length(na.omit(likert_5[,i]))), sep = "")
}
myplot <- sjp.likert(likert_5, legendLabels=levels_5, barColor="brewer", legendSize=0.5, axisLabelSize=0.5, valueLabelSize=2, colorPalette="BrBG", orderBy="pos", legendPos="bottom")
myplot$plot + scale_x_discrete(labels=varnames[c(2,1)])

Edit:

Regarding the missing middle level, I also found this in the code:

if (!is.null(neutral)) {
  out <- out[out$Response != neutral, ]
}

Which deletes the middle 'neutral' category from the output. There doesn't seem to be an option to change this, and none of the author's examples use an odd number of categories. So it seems to be a feature, rather than a bug.

You might consider the likert package, specifically the function likert.bar.plot with the argument include.center = TRUE.