Search code examples
rlarge-data

How can I cut large csv files using any R packages like ff or data.table?


I want to cut large csv files (file size more than RAM size) and use them or save each in disk for later usage. Which R package is best for doing this for large files?


Solution

  • One should have used read.csv.ffdf of ff package with specific parameters like this to read big file:

    library(ff)
    a <- read.csv.ffdf(file="big.csv", header=TRUE, VERBOSE=TRUE, first.rows=1000000, next.rows=1000000, colClasses=NA)
    

    Once big file is read into a ff object, Subsetting ffobject into data frames can be done using: a[1000:1000000,]

    Rest of the code for subsetting and saving broken dataframes totalrows = dim(a)[1] row.size = as.integer(object.size(a[1:10000,])) / 10000 #in bytes

    block.size = 200000000  #in bytes .IN Mbs 200 Mb
    
    #rows.block is rows per block
    rows.block = ceiling(block.size/row.size)
    
    #nmaps is the number of chunks/maps of big dataframe(ff), nmaps = number of maps - 1
    nmaps = floor(totalrows/rows.block)
    
    
    for(i in (0:nmaps)){
      if(i==nmaps){
        df = a[(i*rows.block+1) : totalrows,]
      }
      else{
        df = a[(i*rows.block+1) : ((i+1)*rows.block),]
      }
      #process df or save it
      write.csv(df,paste0("M",i+1,".csv"))
      #remove df
      rm(df)
    }