Search code examples
rr-haven

correctly treating NA values in SPSS .sav file imported into R using R's Haven package


My platform is Windows 10

The data in my .sav file looks like this (the screenshots are from PSPP not SPSS):

Data View:

enter image description here

Variable View: enter image description here

I'm using haven to import the .sav file into R:

library("tidyverse")
library("haven")

haven commands (my .sav filename is spss_missing99.sav):

> spss2 <- read_sav("C:/.../spss_missing99.sav")
> spss2

# A tibble: 11 x 1
   Points
    <dbl>
 1      1
 2      2
 3      3
 4      4
 5      5
 6      6
 7      7
 8      8
 9      9
10     10
11     NA


> is.na(spss2)

      Points
 [1,]  FALSE
 [2,]  FALSE
 [3,]  FALSE
 [4,]  FALSE
 [5,]  FALSE
 [6,]  FALSE
 [7,]  FALSE
 [8,]  FALSE
 [9,]  FALSE
[10,]  FALSE
[11,]   TRUE

> mean(spss2)

[1] NA
Warning message:
In mean.default(spss2) : argument is not numeric or logical: returning NA


> mean(spss2, na.rm = TRUE)

[1] NA
Warning message:
In mean.default(spss2, na.rm = TRUE) :
  argument is not numeric or logical: returning NA

My question: why won't the last 2 mean commands work?

Thanks.


Solution

  • Because you are passing a dataframe/tibble to mean function, mean function works with a vector.

    mean(spss2$Points, na.rm = TRUE)
    #[1] 5.5
    

    You can pass dataframe to colMeans function which will return column-wise mean of all the columns in the dataframe.

    colMeans(spss2, na.rm = TRUE)
    
    #Points 
    #   5.5