Search code examples
rmacoslarge-data

Reading in 6GB SPSS (.dta) dataset into R


I have a large data file that is 6.1 GB on my iMac (OS: Catalina 10.15.4) Processor (3.1 GHz) I have tried multiple ways to read in the file into my R global environment.

library(foreign)
data <- read.dta(file = "File.dta", missing.type = TRUE)

install.packages("readstata13")
library(readstata13)
data <- read.dta13(file = "File.dta")

library(haven)
data <- read_dta('File.dta')

library(memisc)
data <- as.data.frame(file = "File.dta")

Each way I get an error: Error: vector memory exhausted (limit reached?)

I have tried to address this using the following codes to increase the memory I have used:

memory.limit(size = 12000) #This is a Windows only command
Sys.setenv('R_MAX_VSIZE'=32000000000)
options(scipen = 999)

But none of this has worked.

Has anyone had this problem with a Mac and been able to fix this?


Solution

  • Best way was to read in only selected columns of data:

    data <- read_dta("032720.dta", col_select=c("WP5AA","YEAR_WAVE", "WP16", "WP18", "WP23",
                                                                "WP2319", "INCOME_5", "WP119",
                                                                "WP5358", "WP128", "EMP_2010",
                                                                "WP1219", "WP1220", "WP1223", "WP1230", 
                                                                "WP1233Recoded", "income_2", "WP3117", "WP60", "WP63", "WP67"))