Search code examples
rlarge-data

How to read large dataset in R


Possible Duplicate:
Quickly reading very large tables as dataframes in R

Hi,

trying to read a large dataset in R the console displayed the follwing errors:

data<-read.csv("UserDailyStats.csv", sep=",", header=T, na.strings="-", stringsAsFactors=FALSE)
> data = data[complete.cases(data),]
> dataset<-data.frame(user_id=as.character(data[,1]),event_date= as.character(data[,2]),day_of_week=as.factor(data[,3]),distinct_events_a_count=as.numeric(as.character(data[,4])),total_events_a_count=as.numeric(as.character(data[,5])),events_a_duration=as.numeric(as.character(data[,6])),distinct_events_b_count=as.numeric(as.character(data[,7])),total_events_b=as.numeric(as.character(data[,8])),events_b_duration= as.numeric(as.character(data[,9])))
Error: cannot allocate vector of size 94.3 Mb
In addition: Warning messages:
1: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[,  :
  NAs introduced by coercion
2: In data.frame(user_msisdn = as.character(data[, 1]), calls_date = as.character(data[,  :
  NAs introduced by coercion
3: In class(value) <- "data.frame" :
  Reached total allocation of 3583Mb: see help(memory.size)
4: In class(value) <- "data.frame" :
  Reached total allocation of 3583Mb: see help(memory.size)

Does anyone know how to read large datasets? The size of UserDailyStats.csv is approximately 2GB.


Solution

  • Sure:

    1. Get a bigger computer, in particular more ram
    2. Run a 64-bit OS, see 1) about more ram now that you can use it
    3. Read only the columns you need
    4. Read fewer rows
    5. Read the data in binary rather than re-parsing 2gb (which is mighty inefficient).

    There is also a manual for this at the R site.