Search code examples
rsortingaverageweighted

R - Weighted Mean by row for multiple columns based on columns string values


I have a data.frame "DF" of 2020 observations and 79066 variables. The first column is the "Year" spanning continuously from 1 to 2020, the others variables are the values.

In the first instance, I did an average by row in order to have one mean value per year.

E.g.

Aver <- apply(DF[,2:79066], 1, mean, na.rm=TRUE)

However, I would like to do a weighted average and the weight values differ based on columns string values.

The header name of the variables is "Year" (first column) followed by 79065 columns, where the name of each column is composed of a string that starts from 50 to 300, followed by ".R" repeated from 1 to 15 times, and the ".yr" from 10 to 30. This brings 251(50-300) x 15(R) x 21(10-30) = 79065 columns E.g. : "Year", "50.R1.10.yr", "50.R1.11.yr", "50.R1.12.yr", ... "50.R1.30.yr", "51.R1.10.yr", "51.R1.11.yr", "51.R1.12.yr", ... "51.R1.30.yr", ..."300.R1.10.yr", "300.R1.11.yr", "300.R1.12.yr", ... "300.R1.30.yr", "50.R2.10.yr", "50.R2.11.yr", "50.R2.12.yr", ... "50.R2.30.yr", "51.R2.10.yr", "51.R2.11.yr", "51.R2.12.yr", ... "51.R2.30.yr", ..."300.R2.10.yr", "300.R2.11.yr", "300.R2.12.yr", ... "300.R2.30.yr", ... "50.R15.10.yr", "50.R15.11.yr", "50.R15.12.yr", ... "300.R15.30.yr".

The weight I would like to assign to each column is based on the string values 50 to 300. I would like to give more weight to values on the column "50." and following a power function, less weight to "300.".

The equation fitting my values is a power function: y = 2305.2*x^-1.019.

E.g.

av.classes <- data.frame(av=seq(50, 300, 1))
library(dplyr)
av.classes.weight <- av.classes %>% mutate(weight = 2305.2*av^-1.019)

Thank you for any help.


Solution

  • I guess you could get your weight vector like this:

    library(tidyverse)
    
    weights_precursor <- str_split(names(data)[-1], pattern = "\\.", n = 2, simplify = TRUE)[, 1] %>% 
      as.numeric()
    
    weights <- 2305.2 * weights_precursor ^ -1.019