I have a dataframe that contains several different analyses, each of which comprises multiple lines of data. I want to calculate the mean of only a few lines from each analysis, and which lines these are varies from analysis to analysis. There is one dataframe containing the analyses and another containing the start and end line numbers desired for the mean of each analysis. A simple example follows.
The analyses
df1 <- structure(list(analysis = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L),
line = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L),
value = c(4.8, 9.8, 7.1, 7.4, 7.4, 3.3, 4.5, 8.7, 3.9, 3.6, 1.9, 9.4, 0.9, 0.6, 2.1, 6.1, 5.3, 4.7, 9.7, 8.6, 8.9, 3.4, 8.7, 7.1)),
class = "data.frame", row.names = c(NA, -24L))
The starting and ending line numbers over which I'd like the mean for each analysis:
df2 <- structure(list(analysis = 1:3,
startnum = c(3L, 3L, 2L),
endnum = c(7L, 6L, 4L)),
class = "data.frame", row.names = c(NA, -3L))
I envision the resulting dataframe would be like this:
structure(list(analysis=1:3,
avgval=c(5.94, 3.2, 7.67)),
class = "data.frame", row.names = c(NA, -3L))
I believe the answer is straightforward with the 'apply' family, but can't think how to do this. I can convert the dataframes to lists for 'lapply', but not sure how to proceed beyond that.
Many thanks in advance, -R
In tidyverse
you can do:
df1 %>%
left_join(df2, by = c("analysis" = "analysis")) %>% #Merging df1 with df2
group_by(analysis) %>% #Grouping by "analysis"
summarise(avgval = mean(value[line >= startnum & line <= endnum])) #Calculating the mean based on given conditions
analysis avgval
<int> <dbl>
1 1 5.94
2 2 3.20
3 3 7.67