Search code examples
rvtable

Categorical variable of more than five categories not showing on sumtable in R


I am trying to conduct a balance test for treatment and control groups. Using sumtable from vtable package, I constructed a summary statistics table by group. However, a categorical variable of more than 5 categories does not show on the table.

So for example I have a sample dataframe like this:

Treatment <- c("Treated", "Control", "Control", "Treated", "Treated", "Treated", "Control", "Treated", "Control", "Control")
City <- c(1, 4, 6, 2, 3, 3, 2, 5, 4, 6)
Age <- c(56, 70, 12, 54, 23, 9, 33, 38, 27, 49)
Gender <- c(1, 2, 3, 2, 2, 1, 1, 3, 2, 1)
df <- data.frame(Treatment, City, Age, Gender)

I label City and Gender accordingly:

label_city <- c("1" = "City A",
                "2" = "City B",
                "3" = "City C",
                "4" = "City D",
                "5" = "City E",
                "6" = "City F")
df$City <- label_city[match(df$City, names(label_city))]

label_gender <- c("1" = "Male",
                  "2" = "Female",
                  "3" = "Other")
df$Gender <- label_gender[match(df$Gender, names(label_gender))]

Then I create the table:

sumtable(df, group = "Treatment", group.test = TRUE)

I get a summary statistics table with Age and Gender, but without City. When I restrict City to up to five categories, it appears on the table. Is there a way to make City present in the summary table with all the categories?


Solution

  • Got an answer from the maintainer:

    vtable automatically converts character variables into factors for display, but it doesn't do so when there are too many different values of the variable, because then it's probably an actual string variable and there would be N different categories.

    So after doing something like this (Convert data.frame column format from character to factor), all the categories were displayed on vtable.