Search code examples
rgroup-bysum

How to add a row which sums one column for every value in another column?


I have a following dataframe:

 df <- structure(list(Number = c("3117", "3118", "3119", "3120", "3121", 
"3122"), City = c("Акмолинская", "Актюбинская", "Алматинская", 
"Атырауская", "ЗКО", "Жамбылская"), Year = c("2001", "2001", 
"2001", "2001", "2001", "2001"), Info = c("Среднегодовая численность населения РК (чел.)", 
"Среднегодовая численность населения РК (чел.)", "Среднегодовая численность населения РК (чел.)", 
"Среднегодовая численность населения РК (чел.)", "Среднегодовая численность населения РК (чел.)", 
"Среднегодовая численность населения РК (чел.)"), Value = c("765690", 
"669198", "1554447", "445631", "600987", "980563"), Status = c("Факт", 
"Факт", "Факт", "Факт", "Факт", "Факт")), row.names = c(NA, 6L
), class = "data.frame")

I need to sum Value column for each Year to create a sum with "Республика Казахстан" in City column. In other words, I need to create a sum of Value for all cities for each year and name it with a country name in City column. How to do that?

I tried this code, but it gives me "invalid 'type' (character) of argument" error:

for (year in unique(df$Year)) {
  df[nrow(df) + 1,] = c("0","Республика Казахстан", year, "Среднегодовая численность населения РК (чел.)", sum(df[which(df[,3]==year),5]), "Факт")
}

Solution

  • (Up front, my emacs/ess isn't showing the utf-8 strings so they look empty here. They are not.)

    First, to sum the value, it cannot be character. From there, summarize then join the original data.

    base R

    df$Value <- as.numeric(df$Value)
    newdf <- transform(aggregate(Value ~ Year, data = df, FUN = sum), City = "City Sum")
    newdf <- cbind(newdf, df[,setdiff(names(df), names(newdf))][0,][NA,])
    rbind(df, newdf[,names(df)])
    #   Number        City Year                                          Info   Value Status
    # 1   3117             2001                                        (   .)  765690       
    # 2   3118             2001                                        (   .)  669198       
    # 3   3119             2001                                        (   .) 1554447       
    # 4   3120             2001                                        (   .)  445631       
    # 5   3121             2001                                        (   .)  600987       
    # 6   3122             2001                                        (   .)  980563       
    # 7   <NA>    City Sum 2001                                          <NA> 5016516   <NA>
    

    dplyr

    library(dplyr)
    df <- mutate(df, Value = as.numeric(Value))
    df %>%
      group_by(Year) %>%
      summarize(City = "City Sum", Value = sum(Value)) %>%
      bind_rows(df, .)
    #   Number        City Year                                          Info   Value Status
    # 1   3117             2001                                        (   .)  765690       
    # 2   3118             2001                                        (   .)  669198       
    # 3   3119             2001                                        (   .) 1554447       
    # 4   3120             2001                                        (   .)  445631       
    # 5   3121             2001                                        (   .)  600987       
    # 6   3122             2001                                        (   .)  980563       
    # 7   <NA>    City Sum 2001                                          <NA> 5016516   <NA>