Search code examples
rdata.tablemelt

melt dataframe - multiple columns - "Enhanced (new) functionality from data.tables"


Update: I should have been clearer that I was trying to check out the Enhanced functionality in reshaping using data.tables https://cran.r-project.org/web/packages/data.table/vignettes/datatable-reshape.html. Updated the title.

I have this data set with two sets of variables - Credit_Risk_Capital and Name_concentration. They are calculated per 2 methodologies - New and Old. When I melt them using the data.table package, the variable names default to 1 and 2. How can I change them to just Credit_Risk_Capital and Name_Concentration.

Here is the data set

    df <-data.table (id = c(1:100),Credit_risk_Capital_old= rnorm(100, mean = 400, sd = 60),
             NameConcentration_old= rnorm(100, mean = 100, sd = 10),
             Credit_risk_Capital_New =rnorm(100, mean = 200, sd = 10),
             NameConcentration_New = rnorm(100, mean = 40, sd = 10))
    old <- c('Credit_risk_Capital_old','NameConcentration_old')
   new<-c('Credit_risk_Capital_New','NameConcentration_New')
  t1<-melt(df, measure.vars = list(old,new), variable.name = "CapitalChargeType",value.name = c("old","new"))

Now instead of the elements in the CapitalChargeType Column getting tagged as 1's and 2's, I want them to be changed to Credit_risk_Capital and NameConcentration. I can obviously change them in a subsequent step using a 'match' function, but is there anyway I can do it within melt itself.


Solution

  • I'm not sure about using melt, but here's a way using tidyr

    Note that I changed the variable name to use a . instead of _ to separate the name for the old/new. This makes it easier to separate the name into two variables since there are already many underscores.

    library(tidyr)
    
    df <- dplyr::data_frame(
      id = c(1:100),
      Credit_risk_Capital.old= rnorm(100, mean = 400, sd = 60),
      NameConcentration.old= rnorm(100, mean = 100, sd = 10),
      Credit_risk_Capital.new =rnorm(100, mean = 200, sd = 10),
      NameConcentration.new = rnorm(100, mean = 40, sd = 10)
    )
    
    df %>% 
      gather("key", "value", -id) %>% 
      separate(key, c("CapitalChargeType", "new_old"), sep = "\\.") %>% 
      spread(new_old, value)
    
    #> # A tibble: 200 x 4
    #>       id   CapitalChargeType       new       old
    #> *  <int>               <chr>     <dbl>     <dbl>
    #> 1      1 Credit_risk_Capital 182.10955 405.78530
    #> 2      1   NameConcentration  42.21037  99.44172
    #> 3      2 Credit_risk_Capital 184.28810 370.14308
    #> 4      2   NameConcentration  60.92340 120.13933
    #> 5      3 Credit_risk_Capital 191.07982 389.50818
    #> 6      3   NameConcentration  25.81776  90.91502
    #> 7      4 Credit_risk_Capital 193.64247 327.56853
    #> 8      4   NameConcentration  32.71050  94.95743
    #> 9      5 Credit_risk_Capital 208.63547 286.59351
    #> 10     5   NameConcentration  40.76064 116.52747
    #> # ... with 190 more rows