Search code examples
rlevelsr-factor

Is it possible to have different elements in a factor to have a same levels?


I googled many times and the result was not what I want:

A sample dataset is provided as below:

year = c(1991,1996,2001,2006,2011,2016,2021)

factor(a,levels = c(1991,1996,2001,2011,2016,2021))

The result was:

[1] 1991 1996 2001 <NA> 2011 2016 2021
Levels: 1991 1996 2001 2011 2016 2021

I want to set the level of 2006 to be the same as 2001, therefore, my favorable outcome will be:

[1] 1991 1996 2001 2006 2011 2016 2021
Levels: 1991 1996 2001 2011 2016 2021

Is it possible to change the levels of 2006 to be the same as 2001 without changing the original content of the vector year?


Solution

  • When you dig into the source code of factor, I guess you will have the answer in your mind (I think it should be "No" to your question)

    > factor
    function (x = character(), levels, labels = levels, exclude = NA, 
        ordered = is.ordered(x), nmax = NA)
    {
        if (is.null(x))
            x <- character()
        nx <- names(x)
        if (missing(levels)) {
            y <- unique(x, nmax = nmax)
            ind <- order(y)
            levels <- unique(as.character(y)[ind])
        }
        force(ordered)
        if (!is.character(x))
            x <- as.character(x)
        levels <- levels[is.na(match(levels, exclude))]
        f <- match(x, levels)
        if (!is.null(nx))
            names(f) <- nx
        if (missing(labels)) {
            levels(f) <- as.character(levels)
        }
        else {
            nlab <- length(labels)
            if (nlab == length(levels)) {
                nlevs <- unique(xlevs <- as.character(labels))
                at <- attributes(f)
                at$levels <- nlevs
                f <- match(xlevs, nlevs)[f]
                attributes(f) <- at
            }
            else if (nlab == 1L)
                levels(f) <- paste0(labels, seq_along(levels))
            else stop(gettextf("invalid 'labels'; length %d should be 1 or %d",
                nlab, length(levels)), domain = NA)
        }
        class(f) <- c(if (ordered) "ordered", "factor")
        f
    }
    <bytecode: 0x00000186f0fe3640>
    <environment: namespace:base>
    

    As we can see, levels is generated either by unique(x, nmax = nmax) if the levels argument is not provided, or, levels[is.na(match(levels, exclude))] with the given levels. That means, you are not possible to have a single level for two x values.