Search code examples
rdataframegroup-bycharactersummarize

Summaries values from column, bases if they contain certain characters


I have a dataset:

Gemeinde <- c("Adliswil", "Adliswil", "Adliswil", "Adliswil", "Adliswil","Adlikon", "Adlikon", "Adlikon", "Adlikon", "Adlikon")
Country <- c("Schweiz", "Deutschland", "Frankreich", "Türkei", "China","Schweiz", "Deutschland", "Frankreich", "Türkei", "China")
Count <- c(23, 41, 32, 58, 26,23, 41, 32, 58, 26)
df <- data.frame(Gemeinde, Country, Count)
print (df)

I need to summarize the data set based if the Country where the people come from is Swiss or not. So group-by Gemeinde, differ if the country is Swiss or not, and then get the sum of the Count for Gemeinde -Swiss or not.

What I tried:

country_count <- country_count %>%
  group_by(Gemeinde) %>%
  summarize(sumSwiss = sum(Anzahl[Staatsangehörigkeit== "Frankreich & Deutschland & Schweiz"]),
            sumForeigners = sum(Anzahl))

All Countries that are

"Frankreich & Deutschland & Schweiz"

Should become Swiss, all the rest should be "Foreign".

What I need

Gemeinde <- c("Adliswil", "Adlikon")
sumSwiss <- 96
sumForeign <- 84

df <- data.frame(Gemeinde, sumSwiss, sumForeign)

print (df)

Solution

  • library(tidyverse) 
    
    df %>%
      mutate(Country = case_when(
        Country %in% c("Frankreich", "Deutschland", "Schweiz") ~ "Swiss",
        TRUE ~ "Foreign"
      )) %>%
      group_by(Gemeinde, Country) %>%
      summarise(sum = sum(Count)) %>%
      pivot_wider(names_from = "Country",
                  values_from = "sum",
                  names_prefix = "sum")
    
    # A tibble: 2 × 3
    # Groups:   Gemeinde [2]
      Gemeinde sumForeign sumSwiss
      <chr>         <dbl>    <dbl>
    1 Adlikon          84       96
    2 Adliswil         84       96