Search code examples
rfunctionrowfactors

Create a new row in a dataframe, one element is a factor, the other numeric


I am working on doing some fairly basic descriptive statistics for a large group of data. I have written a function to try and get the statistics that I need.

I want to create a new row at the bottom of a dataframe, one element of which is a factor ("total"), and the other element of which is numeric (sum of the other rows).

Here is an example of this code:

Create the dataframe

df <- data.frame(
pop = c(201:250),
age = factor(rep(c("20-29", "30-39", "40-49", "50-59", "60-69"), 10)),
year = factor(rep(c(2012, 2013, 2014, 2015, 2016), 10)) )

Write a function to do the aggregation

DiabMort_fun <- function(VDRpop, VDRage, nyrs, nrows) {
Aggregate_fun <- function(pop, ag1, nyrs, nrows, names_list) {
popbylist <- data.frame(aggregate(pop, by = list(Category = ag1), FUN=sum))
popbylist$mean <- (popbylist$x / nyrs)
colnames(popbylist) = names_list
popbylist[nrows,] <- c("total", sum(popbylist[2]), sum(popbylist[3]))
return(popbylist)
}


VDRbyage <- Aggregate_fun(pop = VDRpop, ag1 = VDRage, nyrs = nyrs, nrows = nrows, 
                        names_list = c("Age", "Num_pop_VDR", "Mean_pop_VDR"))
return(VDRbyage)
}

Run this function

test <- DiabMort_fun(VDRpop =  df$pop, df$age, 
                 nyrs = 5, nrows = 5)

When I run this, I get the following error message:

Warning message: In [<-.factor(*tmp*, iseq, value = "total") : invalid factor level, NA generated

The "totals" column is now c(NA, 11275, 2255)

What I would like it be is c("total", 11275, 2255)

Does anyone know how to create a new row in this function which will expand the factor levels to include "total"? The relevant code within the function is:

popbylist[nrows,] <- c("total", sum(popbylist[2]), sum(popbylist[3]))

Thanks


Solution

  • You shouldn't need to make the age and year columns factors; if you skip that step, and set stringsToFactors = FALSE in the first data.frame() call, your function should work.

    If you really want to keep the present order and data types, you can turn the summary row into a 1-row dataframe, then bind that to the other frame. Just make sure the column names match:

    temp <- data.frame("total", sum(popbylist[2]), sum(popbylist[3]))
    colnames(temp) = names_list
    popbylist <- rbind(popbylist, temp)