Search code examples
rreshapetransposedata-manipulationreshape2

restructuring a data frame of co-variances from long to wide


A data frame have contains three variables:

  1. from - character - the name of a measure
  2. to - character - the name of another measure
  3. covariance - numeric - the covariance between the two measures

Here's a link to the data. Below is the result of head(have):

from                   to                     covariance
a_airportscreener      a_airportscreener     4.419285714
a_airportscreener      e_airportscreener    -1.328928571
a_airportscreener      g_airportscreener    -3.038928571
a_airportscreener      p_airportscreener    0.3292857143
a_airportscreener   pres_airportscreener    0.6452857143
a_automechanic            a_automechanic     2.635535714
a_automechanic            e_automechanic   -0.3439285714

I want to create a data frame called need that records the covariances between prefixed versions of the same job title in separate columns. For example, the first row would look like:

job                a_a       a_e       a_g       a_p       a_pres       e_a       e_e       e_g      e_p      e_pres      g_a      g_e      g_g      g_p      g_pres      p_a      p_e      p_g      p_p      p_pres      pres_a      pres_e      pres_g      pres_p      pres_pres
airportscreener  4.419    -1.329    -3.039     0.329        0.645    -1.329     2.333     2.441   -1.015       0.659   -3.039    2.441   14.253    3.070       0.977    0.329   -1.015    3.070    6.505       0.366       0.645       0.659       0.977       0.366          0.697

(I rounded the values in have to keep the example of need on the page, but this is not part of the question.)


Solution

  • Try this approach on your complete data

    library(tidyverse)
    
    cov_mat %>%
      rownames_to_column() %>% 
      pivot_longer(cols =-rowname) %>%
      mutate(key = paste0(sub("_.*", "\\1", name), "_", sub("_.*", "\\1", rowname)),
             rowname = sub(".*_(.*)_.*", "\\1", rowname), 
             name = sub(".*_(.*)_.*", "\\1", name)) %>%
      filter(rowname == name) %>%
      select(-rowname) %>%
      pivot_wider(names_from = key, values_from = value) 
    
    
    # A tibble: 58 x 26
    #   name   a_a      e_a    g_a     p_a  pres_a      a_e   e_e    g_e .....
    #   <chr> <dbl>    <dbl>  <dbl>   <dbl>   <dbl>    <dbl> <dbl>  <dbl>
    # 1 airp…  4.42 -1.33     -3.04  0.329   0.645  -1.33    2.33   2.44 
    # 2 auto…  2.64 -0.344     6.26 -0.712  -0.595  -0.344   0.499  0.113
    # 3 auto…  2.67 -0.466     2.36 -0.106  -0.878  -0.466   0.72  -5.95 
    # 4 blkj…  2.50  0.529    -6.79  0.0129 -0.0666  0.529   1.56  -8.58 
    # 5 blkt…  1.04 -0.00143   4.86  0.993  -0.194  -0.00143 0.229 -1.69 
    # 6 brid…  4.15  2.05    -11.5  -1.21    0.453   2.05    2.05  -9.09 
    # 7 cart…  1.79  0.458    -4.22  0.451  -0.410   0.458   1.23   3.54 
    # 8 chem…  2.29  0.479    12.4  -0.0384 -0.164   0.479   0.811  2.15 
    # 9 clth…  4.10  1.15    -18.9   1.77    0.728   1.15    1.7   -4.00 
    #10 coag…  2.23 -0.382    -7.79 -0.0190  0.460  -0.382   0.342  4.11