Search code examples
rsplitdatasetmeanchunks

How do I perform calculations after splitting a datset into multiple datasets?


I want to take a dataset and split it into multiple datasets. For a simplified verson of the problem. Realistically, I will have thousands of rows but I would like to simplify the problem for the purpose of understanding. Suppose you have the following code:

vec = c(1:10)
df = data.frame(vec)
df
   vec
1    1
2    2
3    3
4    4
5    5
6    6
7    7
8    8
9    9
10  10

I would like to split this dataset into rows of 5 observations each and then get the mean for each 5 rows.

So far i've tried to split the code in the following manner:

splitdf = split(df, rep(1:2,each = 5))

Now I would like to get the mean of each group. For example, the mean of the first chunk is 3 and the second chunk is 8.

Then, I would like to do a rep function and store it in a separate column. I want my data frame to look like the following:

   vec  mean
1    1     3
2    2     3
3    3     3
4    4     3
5    5     3
6    6     8
7    7     8
8    8     8
9    9     8
10  10     8

I was wondering whether a loop function would be appropriate or if there's a simpler way to go about this problem. I am open to suggestions.


Solution

  • No need to split the data, if you use the same logic of split as a group. For example, in ave

    df$mean <- ave(df$vec, rep(1:2,each = 5)) 
    df
    
    #   vec mean
    #1    1    3
    #2    2    3
    #3    3    3
    #4    4    3
    #5    5    3
    #6    6    8
    #7    7    8
    #8    8    8
    #9    9    8
    #10  10    8
    

    The default function in ave is mean already so we don't apply it explicitly here.