Search code examples
rdata-processing

How to calculate values in one dataframe based on the values in another dataframe


I have a dataframe that contains several different analyses, each of which comprises multiple lines of data. I want to calculate the mean of only a few lines from each analysis, and which lines these are varies from analysis to analysis. There is one dataframe containing the analyses and another containing the start and end line numbers desired for the mean of each analysis. A simple example follows.

The analyses

df1 <- structure(list(analysis = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), 
               line = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L),
               value = c(4.8, 9.8, 7.1, 7.4, 7.4, 3.3, 4.5, 8.7, 3.9, 3.6, 1.9, 9.4, 0.9, 0.6, 2.1, 6.1, 5.3, 4.7, 9.7, 8.6, 8.9, 3.4, 8.7, 7.1)),
          class = "data.frame", row.names = c(NA, -24L))

The starting and ending line numbers over which I'd like the mean for each analysis:

df2 <- structure(list(analysis = 1:3,
               startnum = c(3L, 3L, 2L),
               endnum = c(7L, 6L, 4L)),
          class = "data.frame", row.names = c(NA, -3L))

I envision the resulting dataframe would be like this:

structure(list(analysis=1:3,
               avgval=c(5.94, 3.2, 7.67)),
          class = "data.frame", row.names = c(NA, -3L))

I believe the answer is straightforward with the 'apply' family, but can't think how to do this. I can convert the dataframes to lists for 'lapply', but not sure how to proceed beyond that.

Many thanks in advance, -R


Solution

  • In tidyverse you can do:

    df1 %>%
     left_join(df2, by = c("analysis" = "analysis")) %>% #Merging df1 with df2
     group_by(analysis) %>% #Grouping by "analysis"
     summarise(avgval = mean(value[line >= startnum & line <= endnum])) #Calculating the mean based on given conditions
    
      analysis avgval
         <int>  <dbl>
    1        1   5.94
    2        2   3.20
    3        3   7.67