Search code examples
rdplyrr-haven

How can I check the type of a column if it is an unusual type?


I have some SPSS data from a .sav file and am trying to work with it in R. Many of the variables are of type haven_labelled. I'd like to convert them to double using mutate_if(). How can I create a predicate for mutate_if() that will catch all the columns of type haven_labelled? There is an is.labelled() function in the haven library.


Solution

  • We can use mutate_if to apply the function on columns based on a condition. Here, in the reproducible example below, the labelled attribute is on the 'Species' column, which is converted to factor

    library(dplyr)
    library(haven)
    iris1 <- iris %>%
                mutate_if(is.labelled, factor) 
    

    Or another option is to create the logical condition with class

    iris1 <- iris %>%
               mutate_if(~ class(.) ==  "haven_labelled", factor)
    

    -checking the structure

    str(iris)
    #'data.frame':  150 obs. of  5 variables:
    # $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    # $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
    # $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
    # $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
    # $ Species     : 'haven_labelled' chr  "setosa" "setosa" "setosa" "setosa" ...
    #  ..- attr(*, "labels")= Named chr  "S" "ve" "vi"
    #  .. ..- attr(*, "names")= chr  "setosa" "versicolor" "virginica"
    
    str(iris1)
    #'data.frame':  150 obs. of  5 variables:
    # $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
    # $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
    # $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
    # $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
    # $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 #...
    

    data

    data(iris)
    iris$Species <- labelled(as.character(iris$Species),
           c("setosa" = "S", "versicolor" = "ve", "virginica" = "vi"))