Search code examples
rdataframedata.tableplyr

Using a key to replace values across a whole data.table


I have a large data table which looks as follows:

    V1              V2           V3  V4  V5  V6  V7  V8  V9
1: XS0285400197 TR.IssuerRating  F1  F1  F1  F1  F1  F1  F1
2: XS0041971275 TR.IssuerRating AAA AAA AAA AAA  F1  F1  AAA
3: XS0043098127 TR.IssuerRating  WD  WD  WD  WD  WD  WD  WD

structure(list(V1 = c("XS0285400197", "XS0041971275", "XS0043098127"
), V2 = c("TR.IssuerRating", "TR.IssuerRating", "TR.IssuerRating"
), V3 = c("F1", "AAA", "WD"), V4 = c("F1", "AAA", "WD"), V5 = c("F1", 
"AAA", "WD"), V6 = c("F1", "AAA", "WD"), V7 = c("F1", "F1", "WD"
), V8 = c("F1", "F1", "WD"), V9 = c("F1", "AAA", "WD")), class = "data.frame", row.names = c(NA, 
-3L))

The actual data table is much larger but this should serve as an example. Additionally, I have a key where I want to replace the ratings (here F1,AAA and WD) with numbers.

    Rating CreditQuality
1:   F1               2
2:  AAA               1
3:  WD                6
4:  (P)B2             6
5: (P)Ba1             4
6: (P)Ba2             5

structure(list(Rating = c("F1", "AAA", "WD", "(P)B2", "(P)Ba1", 
"(P)Ba2"), CreditQuality = c(2L, 1L, 6L, 6L, 4L, 5L)), class = "data.frame", row.names = c(NA, 
-6L))

I want to replace these ratings with the CreditQuality I have assigned each rating in the key. This would mean that a cell with F1 is now a 2. A cell with WD would be a 6 and so on. The new table should look as follows:

    V1              V2           V3  V4  V5  V6  V7  V8  V9
1: XS0285400197 TR.IssuerRating   2   2   2   2  2    2  2
2: XS0041971275 TR.IssuerRating   1   1   1   1  2    2  1
3: XS0043098127 TR.IssuerRating   6   6   6   6  6    6  6
      

I have tried using match and mapvalues however match only seems to work for a single column and mapvalues only works for an atomic vector not for a data.table. Some people have had similar problems however most of them only needed to replace values in a single column whereas I want to replace values across multiple columns in a data.table


Solution

  • You can use melt and dcast:

    dcast(
      rating[melt(df, id=c("V1", "V2"),value.name = "Rating"), on="Rating"],
      V1+V2~variable, value.var = "CreditQuality"
    )
    

    Output:

                 V1              V2 V3 V4 V5 V6 V7 V8 V9
    1: XS0041971275 TR.IssuerRating  1  1  1  1  2  2  1
    2: XS0043098127 TR.IssuerRating  6  6  6  6  6  6  6
    3: XS0285400197 TR.IssuerRating  2  2  2  2  2  2  2
    

    Note: I'm assuming your source data is df, and your Rating data is rating. I see that your frames are already of class data.table