Search code examples
rchunking

Automating chunking of big data in R using a loop


I am trying to break a very large dataset into chunks. I my code looks like this:

#Chunk 1
data <- read.csv("/Users/admin/Desktop/data/sample.csv", header=T, nrow=1000000)
write.csv(data, "/Users/admin/Desktop/data/data1.csv")

#Chunk 2
data <- read.csv("/Users/admin/Desktop/data/sample.csv", header=F, nrow=1000000, skip=1000000)
write.csv(data, "/Users/admin/Desktop/data/data2.csv")

#Chunk 3
data <- read.csv("/Users/admin/Desktop/data/sample.csv", header=F, nrow=1000000, skip=2000000)
write.csv(data, "/Users/admin/Desktop/data/data3.csv")

There are hundreds of millions of rows in my dataset, so I need to create a lot of chunks, and I would really like to automate the process. Is there a way to loop this command so that each subsequent chunk automatically skips 1,000,000 more rows than the previous chunk did, and the file saves as "dataN.csv" (N symbolizing the subsequent number from that of the previous file)?


Solution

  • What about the following way? For the demonstration I created a data frame with two columns and ten lines and read it within a loop two times, each time five lines, saving the result as a text file:

    f<-"C:/Users/MyPC/Desktop/"
    for(i in 1:2){
        df <- read.table("C:/Users/MyPC/Desktop/df.txt", header=FALSE, nrow=5, skip=5*(i-1))
        file <- paste(f,sep="","df",i,".txt")
        write.table(df,file)
    }