Search code examples
rdataframenormal-distribution

Generate a QQPlot for all numeric variables in a dataframe using r


I would like to generate separate qqplots for all numeric variables in a dataframe to assess univariate normality (only an x variable is required). The plots do not have to be stored as a list -- only displayed in r-studio.

I've tried multiple approaches with no luck including qqline/qqnorm (base r), various iteration of qplot (ggplot2), and qqPlot (EnvStats) in conjunction with apply and a for loop. Below are a few examples. txhousing is from ggplot2.

Use any library you deem appropriate to address the intent of the question.

df <- txhousing

df.num.vec <- names(df)[sapply(df, is.numeric)]

df.num <- df[, df.num.vec]

apply(df.num,2,qqPlot)

This results in a series of errors:

Warning messages:
1: In is.not.finite.warning(x) :
  There were 568 nonfinite values in x : 568 NA's
2: In FUN(newX[, i], ...) :
  568 observations with NA/NaN/Inf in 'x' removed.
3: In is.not.finite.warning(x) :
  There were 568 nonfinite values in x : 568 NA's
4: In FUN(newX[, i], ...) :
  568 observations with NA/NaN/Inf in 'x' removed.
5: In is.not.finite.warning(x) :
  There were 616 nonfinite values in x : 616 NA's
6: In FUN(newX[, i], ...) :
  616 observations with NA/NaN/Inf in 'x' removed.
7: In is.not.finite.warning(x) :
  There were 1424 nonfinite values in x : 1424 NA's
8: In FUN(newX[, i], ...) :
  1424 observations with NA/NaN/Inf in 'x' removed.
9: In is.not.finite.warning(x) :
  There were 1467 nonfinite values in x : 1467 NA's
10: In FUN(newX[, i], ...) :
  1467 observations with NA/NaN/Inf in 'x' removed.
df <- txhousing

for (i in seq_along(df)) {
  x <- df[[i]]
  if (!is.numeric(x)) next
  qqPlot(df[,i])
}

This results in:

Error in qqPlot(df[, i]) : 'x' must be a numeric vector

Solution

  • Since you already filtered numeric columns saved in df.num, you can directly use df.num in the for() loop:

    #Using ggplot to create qqplots and save in the list
    library(ggplot2)
    # Create a list
    qq_list <- list()
    
    # the for loop
    for (var in names(df.num)) {
      # Create a Q-Q plot and store it in the list
      qq_list[[var]] <- ggplot(df.num, aes(get(var))) +
        geom_qq() +
        ggtitle(paste0("Q-Q Plot of ", var))
    }
    
    # Print the list of plots
    print(qq_list)
    

    Using car

    library(car)
    for (i in 1:ncol(df.num)) {
      
      qqPlot(df.num[, i], main = names(df.num)[i])
    }
    

    If you want to save your plots into for example a .pdf file you can do as below:

    myqq = "qq.pdf"
    
    pdf(file=myqq) 
    
    for (i in 1:ncol(df.num)) {
      
      qqPlot(df.num[, i], main = names(df.num)[i])
    }
    
    dev.off()
    

    You can access the pdf file in your working directory with the name of 'qq.pdf'