Search code examples
rdataframesubsetlarge-data

How to make smaller subsets based upon a fixed number of rows repeating over the dataframe


My Problem:

I have a dataframe consisting of 86016000 rows of observations:

  1. there are 512000 observations for each hour
  2. there are 24 hours data for seven days
  3. So 24*7*512000 = 86016000
  4. there are 40 columns (variables)
  5. There is no column of date or datetimestamp
  6. Only row numbers are good enough to identify how many obs. for each day, and there are no errors in recording of this data.

Given such a large dataset, what I want to do is create subsets of 12288000 (i.e. 24 * 512000) rows, so that we have 7 each day's subset.

What I tried:

d <- split(PltB_Fold3_1_Data, rep(1:12288000, each=7))

But unfortunately after almost half an hour, I termicated the process as there was no result.

Is there any better solution then the one above?


Solution

  • You're probably looking for seq rather than rep. With seq, you can generate a sequence of numbers from 0 to 86016000 incremented by 12288000.

    To save resources, you can then use this sequence to generate temporary data frames and do whatever you want with each.

    sequence <- seq(from = 0, to = 86016000, by = 12288000)
    
    for(i in 1:(length(sequence)-1)){
        temp <- df[sequence[i]+1:sequence[i+1], ]
        # do something here with your temporary data frame
    }