Search code examples
rimportsplitdatasetsize

Split dataset file in parts of a specific size


I want to analyze this dataset on a system that limits the imports to 100MBs at a time.

How should one split a dataset, per rows, into a max of 100MBs parts?


Solution

    1. Read the dataset.
    2. Split the dataset into 14 chunks (in 13 chunks I had a file with over 100MB).
    3. Then I saved the result back as a csv using purrr

    Here is the script I used:

    trade = read.csv("commodity_trade_statistics_data.csv")
    
    no_of_chunks <- 14
    
    f <- ceiling(1:nrow(trade) / nrow(trade) * 14)
    
    res <- split(trade, f)
    
    library(purrr)
    map2(res, paste0("chunk_", names(res), ".csv"), write.csv)