Search code examples
rcorrelationspreadwide-format-data

Long to wide format from first column


I have an excel file with the headings: trait, genotype_id, BLUE, BLUE_SE, BLUP, BLUP_SE and heritability. All columns contain numbers except the trait column. This column has the traits Fe, Zn and S. So I have like 3 sets of data underneath each other. I want these to be summarised for the genotype so I have something like genotype_id, BLUE_Fe, BLUE_Zn, BLUE_S, BLUE_SE_Fe etc.
How do I transform this file in R so that the traits are not only one column, but that I have the characteristics for Fe, Zn and S in separate columns respectively so that I can create a correlation matrix in R afterward?

I tried

data_wide <- spread(allBLUEs, trait, BLUE)

but this then obviously only moves the BLUEs into the same rows. I tried

data_wide <- spread(allBLUEs, key = trait, value = c(BLUE, BLUE_SE, BLUP, BLUP_SE, heritability), sep = "_")

but the value term seems to only be able to look at one column?

my data

df=tribble(~trait,~genotype_id,~BLUE,~BLUE_SE,~ BLUP,~BLUP_SE,~ heritability,
           "Fe", 3, 47.2, 2.13, 43.0, 1.76, 0.685,
           "Fe", 386, 42.5, 2.13, 39.8, 1.76, 0.685,
           "Zn", 3, 24.4, 1.74, 23.6, 1.18, 0.456,
           "S", 386, 1253, 51.3, 1269, 38.0, 0.545)

Solution

  • I think you can do it with pivot_wider form tidyr.

    library(tibble)
    library(tidyr)
    df=tribble(~trait,~genotype_id,~BLUE,~BLUE_SE,~ BLUP,~BLUP_SE,~ heritability,
               "Fe", 3, 47.2, 2.13, 43.0, 1.76, 0.685,
               "Fe", 386, 42.5, 2.13, 39.8, 1.76, 0.685,
               "Zn", 3, 24.4, 1.74, 23.6, 1.18, 0.456,
               "S", 386, 1253, 51.3, 1269, 38.0, 0.545)
    df %>%
      pivot_wider(
        names_from = trait,
        values_from = BLUE:heritability
      )
    
    #> # A tibble: 2 × 16
    #>   genoty…¹ BLUE_Fe BLUE_Zn BLUE_S BLUE_…² BLUE_…³ BLUE_…⁴ BLUP_Fe BLUP_Zn BLUP_S
    #>      <dbl>   <dbl>   <dbl>  <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>
    #> 1        3    47.2    24.4     NA    2.13    1.74    NA      43      23.6     NA
    #> 2      386    42.5    NA     1253    2.13   NA       51.3    39.8    NA     1269
    #> # … with 6 more variables: BLUP_SE_Fe <dbl>, BLUP_SE_Zn <dbl>, BLUP_SE_S <dbl>,
    #> #   heritability_Fe <dbl>, heritability_Zn <dbl>, heritability_S <dbl>, and
    #> #   abbreviated variable names ¹​genotype_id, ²​BLUE_SE_Fe, ³​BLUE_SE_Zn,
    #> #   ⁴​BLUE_SE_S
    

    Created on 2023-04-08 with reprex v2.0.2