Search code examples
rencodingutf-8spsslatin1

Control encoding when parsing SPSS file using package memisc


I have been given a SPSS system file that I would like to analyse using R. I am using the following magic for parsing the file into R.

library(memisc)
foo <- spss.system.file("foobar.sav")
bar <- subset(foo, select=c(var1,var2,var3))

When having a look at the parsed data, you get the following:

> bar
Data set with 379 observations and 3 variables

var1       var2        var3
1      gut    weiblich      Herbst
2      gut mnlich      Sommer
3      gut mnlich      Sommer
4      gut mnlich      Winter
5      gut mnlich Fr�hling
6      gut mnlich Fr�hling
7      gut    weiblich Fr�hling
.
.
.
25      gut    weiblich Fr�hling
.. ........ ........... ...........
(27 of 379 observations shown)

I guess you get the idea. I am relatively sure that the .sav-file has been saved using the latin1-encoding. How can I tell spss.system.file() to use this encoding when parsing the SPSS-file?


Solution

  • Thank you everyone for your help. I will be answering my own question. spss.system.file() reads strings contained in SPSS files as-is, without any translation. The resulting strings therefore do not contain any encoding information. The memisc package contains a function Iconv, however, that does exactly what the Unix function iconv would do.

    > library(memisc)
    > foo <- spss.system.file("foobar.sav")
    > foo <- Iconv(foo,from="Latin1",to="UTF-8")
    > foo <- as.data.frame(as.data.set(foo))
    > head(foo$Geschlecht)
    [1] weiblich männlich männlich männlich männlich männlich
    Levels: männlich weiblich
    

    All the best.