I want to take a dataset and split it into multiple datasets. For a simplified verson of the problem. Realistically, I will have thousands of rows but I would like to simplify the problem for the purpose of understanding. Suppose you have the following code:
vec = c(1:10)
df = data.frame(vec)
df
vec
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
10 10
I would like to split this dataset into rows of 5 observations each and then get the mean for each 5 rows.
So far i've tried to split the code in the following manner:
splitdf = split(df, rep(1:2,each = 5))
Now I would like to get the mean of each group. For example, the mean of the first chunk is 3 and the second chunk is 8.
Then, I would like to do a rep function and store it in a separate column. I want my data frame to look like the following:
vec mean
1 1 3
2 2 3
3 3 3
4 4 3
5 5 3
6 6 8
7 7 8
8 8 8
9 9 8
10 10 8
I was wondering whether a loop function would be appropriate or if there's a simpler way to go about this problem. I am open to suggestions.
No need to split the data, if you use the same logic of split as a group. For example, in ave
df$mean <- ave(df$vec, rep(1:2,each = 5))
df
# vec mean
#1 1 3
#2 2 3
#3 3 3
#4 4 3
#5 5 3
#6 6 8
#7 7 8
#8 8 8
#9 9 8
#10 10 8
The default function in ave
is mean
already so we don't apply it explicitly here.