Search code examples
rfactorslevels

How to map names of factors having identical levels but different length?


I have two factors tID and fff with the same levels but different lengths of 45000000 and 23000, respectively:

> head(factor(tID))
Fungi Metazoa   Fungi   Fungi   Fungi   Fungi 
227321   79782   52586 1658174  573508   88771

Levels: 2 7 9 11 14 16 17 19 20 22 23 24 32 33 34 38 39 41 42 43 47 48 51 52 54 56 61 68 69 72 75 81 85 86 103 104 106 114 119 120 122 124 125 128 134 140 141 142 143 144 148 154 158 159 162 163 165 167 171 172 173 174 179 ... 1985254

head(fff)
[1] 4932   870730 34413  4932   4932   9606  
Levels: 2 7 9 11 14 16 17 19 20 22 23 24 32 33 34 38 39 41 42 43 47 48 51 52 54 56 61 68 69 72 75 81 85 86 103 104 106 114 119 120 122 124 125 128 134 140 141 142 143 144 148 154 158 159 162 163 165 167 171 172 173 174 179 ... 1985254

Is there any faster way to map names from factor tID to fff?

I know I can do this using lappy() or sapply() but the factors contain 4.5 million elements so it's bit slow.


Solution

  • With names and match:

    names(fa) <- names(fb)[match(fa, fb)]
    

    you get:

    > fa
     name_1  name_1  name_1  name_6  name_6  name_6 name_11 name_11 name_11 name_16 name_16 name_16 name_21 name_21 name_21 
          a       a       a       b       b       b       c       c       c       d       d       d       e       e       e 
    Levels: a b c d e
    

    For the new example in the question this should be:

    names(fff) <- names(tID)[match(fff, tID)]
    

    Example data:

    fa <- factor(rep(letters[1:5], each = 3))
    fb <- factor(rep(letters[1:5], each = 5))
    fb <- setNames(fb, paste0('name_',seq_along(fb)))