Search code examples
rdplyrsamplesummarization

R - Calculating a combined metric for a dataframe based on another


My dataframe looks like this:

df = data.frame(Region=c(rep("NORDICS",1100),rep("DACH",900),rep("MED",1800),rep("CEE",15000),
                     rep("FRANCE",2000),rep("UK&I",2500)),
            Score=c(sample(seq(from = 1, to = 4, by = 1), size = 1100, replace = TRUE,prob = c(0.6,0.2,0.1,0.1)),
                 sample(seq(from = 1, to = 4, by = 1), size = 900, replace = TRUE,prob = c(0.3,0.3,0.2,0.2)),
                 sample(seq(from = 1, to = 4, by = 1), size = 1800, replace = TRUE,prob = c(0.8,0.1,0.05,0.05)),
                 sample(seq(from = 1, to = 4, by = 1), size = 15000, replace = TRUE,prob = c(0.2,0.2,0.2,0.4)),
                 sample(seq(from = 1, to = 4, by = 1), size = 2000, replace = TRUE,prob = c(0.9,0.05,0.03,0.02)),
                 sample(seq(from = 1, to = 4, by = 1), size = 2500, replace = TRUE,prob = c(0.9,0.05,0.03,0.02))))

The dataframe is a collection of individual scores by region, where each observation is a single score to a question (column Score).

The question is on a scale from 1 to 4.

Based on this dataframe, I calculate a KPI by region from the Score column. The KPI is the sum of responses that are either 1 or 2, divided by the total number of responses for a given region.

My code below to calculate the KPI by region:

library(dplyr)

KPI_by_Region=df %>% group_by(Region) %>%
summarise(KPI = sum(Score %in% c(1,2))/n())

My Question

Using only the KPI_by_Region dataframe, that contains the KPI scores by region -

Can I find out the KPI score for all regions combined, without having to run my code over the entire dataframe (df)?


Solution

  • Does this give the result you are looking for?

    KPI_by_Region <- df %>%
      group_by(Region) %>%
      summarise(KPI = sum(Score %in% c(1,2))/n(), Count = n())
    
    allRegionsKPI <- sum(KPI_by_Region$KPI * KPI_by_Region$Count) / sum(KPI_by_Region$Count)