Search code examples
rexpss

R: Ordered factor in expss table creation is in the wrong order


I'd be grateful if someone could tell me why the following is happening and how to correct it.

I'm using the expss package to create a table as follows:

table <- dta %>%
        tab_cells(dta[["x"]]) %>%
        tab_rows(factor(dta[["y"]], ordered=TRUE)) %>%
        tab_weight(dta[["weight"]]) %>%
        tab_stat_cpct(total_statistic = "w_cpct") %>%
        tab_pivot() %>%
        split_columns()

I put factor(dta[[y]], ordered=TRUE) so that the factor is ordered in the table. With my other variables this has worked but somehow not with this one.

If I only enter factor(dta[[y]], ordered=TRUE) into the console it returns correctly

Levels: 537 < 564 < 650 < 1010

However, if I use the above function to create a data table, then for whatever reason it's ordered as follows:

1010 537 564 650

What can I do so that it's in the correct order?

This is a sample dataset to re-create the problem:

dta <- data.frame(x = c(1,1,1,2,1,1,1,1,1,1,1,2,1,2,2,2,1,1,2,2),
                  y = c(1010,650,650,537,650,650,650,650,564,650,650,650,564,564,564,564,650,650,564,564),
                  weight = c(42.066290,3.126177,3.808385,4.812877,8.093253,1.559941,6.168395,2.419531,3.937412,4.293246,20.445602,16.504405,1.314727,2.474295,2.274015,2.668155,3.864480,2.521209,2.605202,2.194348))

Thanks a lot in advance!


Solution

  • Yes, it's a bug in expss. You can use sorting workaround, wich reorder table according to numeric values:

    sort_workaround = function(tbl){
        separated_labels = as.data.frame(split_labels(tbl[[1]], remove_repeated = FALSE))
        # [,-ncol(separated_labels)] to keep total position 
        separated_labels = type.convert(separated_labels, as.is = TRUE)[,-ncol(separated_labels)]
        new_order = do.call(order, separated_labels)
        tbl[new_order, ]
    }
    
    table <- dta %>%
        tab_cells(x) %>%
        tab_rows(factor(y, ordered=TRUE)) %>%
        tab_weight(weight) %>%
        tab_stat_cpct(total_statistic = "w_cpct") %>%
        tab_pivot() %>% 
        sort_workaround() %>%
        split_columns()
    
    
    table