Search code examples
rdata.tabledata-manipulationrescale

Rescale multiple variable at once


I would like to rescale multiple variables at once. Each one of the variable should be rescaled between 0 and 10. My dataset looks something like this

df<-structure(list(Year = 1985:2012, r_mean_dp_C_EU_PTA = c(0.166685371371432,0, 0.340384674048008, 0.255663634111618, 0.137833312888481, 0.215940736735375,0.695926742038269, 1.12488458324014, 1.50426967770413, 1.96800275204271, 
                                                             1.84220420613839, 2.55081439923073, 2.83958315572122, 3.02471358081631, 2.76227596053162, 5.13672466755955, 6.22501740311663, 6.04685020876299, 
                                                             5.48990293535953, 5.74245144436088, 6.87554176822673, 5.35866756802216,6.21821261660873, 7.39740372167956, 7.37052059919359, 8.4053331043966, 
                                                             7.88284279150424, 10), 
                    r_mean_dp_C_US_PTA = c(0, 0.0243131684738152,  0.0295348762350131, 1.24572619158458, 1.20624633452509, 1.57418568231032,1.45479246796848, 2.38700784566208, 2.62865525326503, 2.26401361870534,2.67319203680329, 2.64440548764366, 3.10459526464658, 3.05231530072328, 
                                                                                                           3.32660416229216, 4.14909239351474, 3.76404440984403, 3.79766644256544,4.55279786294561, 5.57506946922008, 6.83412605593388, 8.07241989452914,9.10370786838265, 9.51564633960853, 8.64357423479438, 9.10723202296861,10, 9.06442082870898), 
                    r_mean_dp_C_eu_esr_sum = c(0.0267071299038037,0, 0.0481033555876806, 0.039231355183461, 0.0255363040160583,0.0284158726695472, 0.234715155525714, 0.544954230234254, 0.683338138878583, 0.828929653572072, 0.950656658215744, 1.21492080702167, 1.30147631753441, 1.36122263965133, 1.33106989847101, 1.7848396827464, 2.19247065377408, 2.1506217173316, 4.91794342139369, 4.83398913690854, 7.28545175419305,5.42827409024432, 7.34375238832023, 8.91410171271897, 8.98533852868884,  9.17361943843028, 9.21421152468197, 10)), row.names = c(NA, -28L
                                                                                                                                                             ), 
class = c("data.table", "data.frame"))

I have tried to use the package scales but it does not work

While the function with name identifiers fails

library(scales)
vars<-names(df[,2:4])
tst<-setDT(df)[, (vars):=lapply((vars), function(x) rescale(x,to = c(0,10)))]

Using position identifiers sets all the variable values to 5 which is not what I am looking for.

tst<-setDT(df)[, 2:4:=lapply(2:4, function(x) rescale(x,to = c(0,10)))]
tst
#    Year r_mean_dp_C_EU_PTA r_mean_dp_C_US_PTA r_mean_dp_C_eu_esr_sum
# 1: 1985                  5                  5                      5
# 2: 1986                  5                  5                      5
# 3: 1987                  5                  5                      5
# 4: 1988                  5                  5                      5
# 5: 1989                  5                  5                      5
# 6: 1990                  5                  5                      5
# 7: 1991                  5                  5                      5
# 8: 1992                  5                  5                      5
# 9: 1993                  5                  5                      5
# 10: 1994                  5                  5                      5
# 11: 1995                  5                  5                      5
# 12: 1996                  5                  5                      5
# 13: 1997                  5                  5                      5
# 14: 1998                  5                  5                      5
# 15: 1999                  5                  5                      5
# 16: 2000                  5                  5                      5
# 17: 2001                  5                  5                      5
# 18: 2002                  5                  5                      5
# 19: 2003                  5                  5                      5
# 20: 2004                  5                  5                      5
# 21: 2005                  5                  5                      5
# 22: 2006                  5                  5                      5
# 23: 2007                  5                  5                      5
# 24: 2008                  5                  5                      5
# 25: 2009                  5                  5                      5
# 26: 2010                  5                  5                      5
# 27: 2011                  5                  5                      5
# 28: 2012                  5                  5                      5

Does anyone know what I am doing wrong?

Thanks a lot in advance for your help


Solution

  • We can use .SDcols.

    To apply by names

    library(data.table)
    
    df[, (vars):= lapply(.SD, scales::rescale, to = c(0, 10)), .SDcols = vars]
    

    To apply by position

    df[, 2:4 := lapply(.SD, scales::rescale, to = c(0, 10)), .SDcols = 2:4]