Search code examples
rstatisticsgtsummary

labelling of ordered factor variable


I am trying to produce a univariate output table using the gtsummary package.

structure(list(id = 1:10, age = structure(c(3L, 3L, 2L, 3L, 2L, 
2L, 2L, 1L, 1L, 1L), .Label = c("c", "b", "a"), class = c("ordered", 
"factor")), sex = structure(c(2L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 2L), .Label = c("F", "M"), class = "factor"), country = structure(c(1L, 
1L, 1L, 1L, 3L, 3L, 3L, 2L, 2L, 2L), .Label = c("eng", "scot", 
"wale"), class = "factor"), edu = structure(c(1L, 1L, 1L, 2L, 
2L, 2L, 3L, 3L, 3L, 3L), .Label = c("x", "y", "z"), class = "factor"), 
lungfunction = c(45L, 23L, 25L, 45L, 70L, 69L, 90L, 50L, 
62L, 45L), ivdays = c(15L, 26L, 36L, 34L, 2L, 4L, 5L, 8L, 
9L, 15L), no2 = c(40L, 70L, 50L, 60L, 30L, 25L, 80L, 89L, 
10L, 40L), pm25 = c(15L, 20L, 36L, 48L, 25L, 36L, 28L, 15L, 
25L, 15L)), row.names = c(NA, 10L), class = "data.frame")

...
library(gtsummary)
publication_dummytable1_sum %>% 
select(sex,age,lungfunction,ivdays) %>% 
tbl_uvregression(
method =lm,
y = lungfunction,
pvalue_fun = ~style_pvalue(.x, digits = 3)
) %>% 
add_global_p() %>%  # add global p-value 
bold_p() %>%        # bold p-values under a given threshold
bold_labels()
...   

When I run this code I get the output below. The issue is the labeling of the ordered factor variable (age). R chooses its own labeling for the ordered factor variable. Is it possible to tell R not to choose its own labeling for ordered factor variables?

enter image description here

I want output like the following:

enter image description here


Solution

  • Like many other people, I think you might be misunderstanding the meaning of an "ordered" factor in R. All factors in R are ordered, in a sense; the estimates etc. are typically printed, plotted, etc. in the order of the levels vector. Specifying that a factor is of type ordered has two major effects:

    • it allows you to evaluate inequalities on the levels of the factor (e.g. you can filter(age > "b"))
    • the contrasts are set by default to orthogonal polynomial contrasts, which is where the L (linear) and Q (quadratic) labels come from: see e.g. this CrossValidated answer for more details.

    If you want this variable treated in the same way a regular factor (so that the estimates are made for differences of groups from the baseline level, i.e. treatment contrasts), you can:

    • convert back to an unordered factor (e.g. factor(age, ordered=FALSE))
    • specify that you want to use treatment contrasts in your model (in base R you would specify contrasts = list(age = "contr.treatment"))
    • set options(contrasts = c(unordered = "contr.treatment", ordered = "contr.treatment")) (the default for ordered is "contr.poly")

    If you have an unordered ("regular") factor and the levels are not in the order you want, you can reset the level order by specifying the levels explicitly, e.g.

    mutate(across(age, factor, 
       levels = c("0-10 years", "11-20 years", "21-30 years", "30-40 years")))
    

    R sets the factors in alphabetical order by default, which is sometimes not what you want (but I can't think of a case where the order would be 'random' ...)