I have a data frame where the first column represents time and the successive columns ( all 49 of them T-T) hold values at those points in time. I'm trying to define time points t1 and t2 to take averages over within each column, and then arrange those averages in a vector to be able to do vector math with it. In other words the vector that I'm trying to make would have the average of the values from column 2 (remember column 1 is time) over t1 and t2, followed by the average of the values from t1 to t2 of column 3, followed by the average of the values from t1 to t2 of column 4, etc. Finally, I need to make multiple vectors (A,B, and C) for different time points, so for example maybe vector A has the averages from each column over t1 and t2, but B would have the averages from each column over t3 and t4.
I'm completely lost and largely a total noob when it comes to programming so I hope this makes sense. Any advice is appreciated! Thanks so much :)
Not sure if this counts as a reproducible example, but in essence, I have a table like:
t | col1 | col2 | col3 | col4 |
---|---|---|---|---|
1 | 1.1 | 2.1 | 3.1 | 4.1 |
2 | 1.2 | 2.2 | 3.2 | 4.2 |
3 | 1.3 | 2.3 | 3.3 | 4.3 |
4 | 1.4 | 2.4 | 3.4 | 4.4 |
5 | 1.5 | 2.5 | 3.5 | 4.5 |
and I want to define time points like: t1 = 1 and t2 = 3 so that I can take the average over those points from each column, so that the resulting vector would be of the form:
| 1.2 | 2.2 | 3.2 | 4.2 |
where each entry comes from (1.1+1.2+1.3)/3 , (2.1+2.2+2.3)/3, etc.
Again, super sorry I'm so new to this
There are a few different ways to go about this. I am going to walk through one that will hopefully be easy to understand.
This problem can be broken down into two parts:
Starting with task 1, this should be fairly simple. In your example, the time value matches the row number. If that is true in your dataset too, then you can simply do:
startRow <- t1
endRow <- t2
However, if that is not true, then you have to find those indices. You can do that in R using the match
function. Namely, you would do this:
startRow <- match(t1, df$time)
endRow <- match(t2,df$time)
Now that we have our start and end indices for our rows, we can subset our dataframe quite easily. All we have to do to get the rows we want is to ask for df[startRow:endRow,]
i.e., ask for all the rows from startRow to endRow, both inclusive. Now, all we have to do is to get our averages. There are two ways I can think of. One is to use the function lapply
like this:
lapply(df[startRow:endRow,],ave)
what this does is apply the average function on each column in the dataframe. Just discard the time average since it is useless. Also, it should be noted that average returns a list of the same size that it takes in. That is ave(c(1,2,4))
returns the equivalent of c(2.33,2.33,2.33)
. So, if you want a x1,x2,....,xn vector, you have to do this:
averages <- lapply(df[startRow:endRow,],ave)
lapply(averages[2:length(averages)], function(x) x[1])
averages[2:length(averages)]
selects all the averages except for time, and lapply(averages[2:length(averages)], function(x) x[1])
takes each list of averages and reduces it down to 1.
The other way to do this would be with a loop. You could do something like this to get the result you want:
averages = c()
for(i in 2:dim(df)[2]){
colAverage <- ave(df[startRow:endRow,i])[1]
averages <- c(averages, colAverage)
}
What you are doing here is going through each of your columns, taking the average and then adding it to the vector averages
.