Search code examples
runiqueidentifier

R - Replace unique identifiers with something less complicated


I have two data frames that are related by a really long user ID, and I want to replace these values with something more readable, like a simple integer value. Obviously I want to keep these values consistent between data frames and I was wondering if there is a simple way to do this. Here is what the data.frames look like:

ArtistData - Shows how many times a user listened to a particular artist:

UserID                                     Artist      Plays
00000c289a1829a808ac09c00daf10bc3c4e223b   elvenking   706
00000c289a1829a808ac09c00daf10bc3c4e223b   lunachicks  538
00001411dc427966b17297bf4d69e7e193135d89   stars       373
...                                        ...         ...

UserData - Shows information on each individual user:

UserID                                     gender   age  country
00001411dc427966b17297bf4d69e7e193135d89   m        21   Germany
00004d2ac9316e22dc007ab2243d6fcb239e707d   f        34   Mexico
000063d3fe1cf2ba248b9e3c3f0334845a27a6bf   m        27   Poland
...                                        ...      ...  ...

So basically, can I replace these long strings that have no meaning for me with an integer that is consistent between each data frame?


Solution

  • Convert to factors with simplified labels, using all possible UserID's in both datasets:

    levs <- union(UserData$UserID, ArtistData$UserID)
    
    ArtistData$newid <- factor(
      ArtistData$UserID, levels=levs, labels=seq_along(levs)
    )
    
    UserData$newid <- factor(
      UserData$UserID, levels=levs, labels=seq_along(levs)
    )
    
    ArtistData
    #                                    UserID     Artist Plays newid
    #1 00000c289a1829a808ac09c00daf10bc3c4e223b  elvenking   706     4
    #2 00000c289a1829a808ac09c00daf10bc3c4e223b lunachicks   538     4
    #3 00001411dc427966b17297bf4d69e7e193135d89      stars   373     1
    
    UserData
    #                                    UserID gender age country newid
    #1 00001411dc427966b17297bf4d69e7e193135d89      m  21 Germany     1
    #2 00004d2ac9316e22dc007ab2243d6fcb239e707d      f  34  Mexico     2
    #3 000063d3fe1cf2ba248b9e3c3f0334845a27a6bf      m  27  Poland     3