Search code examples
rsortingtidyrspread

R Order only one factor level (or column if after) to affect order long to wide (using spread)


I have a problem after changing my dataset from long to wide (using spread, from the tidyr library on the Result_Type column). I have the following example df:

Group<-c("A","A","A","B","B","B","C","C","C","D", "D")
Result_Type<-c("Final.Result", "Verification","Test", "Verification","Final.Result","Fast",
               "Verification","Fast", "Final.Result", "Test", "Final.Result")
Result<-c(7,1,8,7,"NA",9,10,12,17,50,11)
df<-data.frame(Group, Result_Type, Result)

df
   Group  Result_Type Result
1      A Final.Result      7
2      A Verification      1
3      A         Test      8
4      B Verification      7
5      B Final.Result     NA
6      B         Fast      9
7      C Verification     10
8      C         Fast     12
9      C Final.Result     17
10     D         Test     50
11     D Final.Result     11

In the column Result_type there are many possible result types and in some datasets I have Result_Type 's that will not occur in other datasets. However, one level: Final.Resultdoes occur in every dataset.

Also: This is example data but the actual data has many different columns, and as these differ across the datasets I use, I used spread (from the tidyr library) so I don't have to give any specific column names other than my target columns.

library("tidyr")
df_spread<-spread(df, key = Result_Type, value = Result)

  Group Fast Final.Result Test Verification
1     A <NA>            7    8            1
2     B    9           NA <NA>            7
3     C   12           17 <NA>           10
4     D <NA>           11   50         <NA>

What I would like is that once I convert the dataset from long to wide, Final.Result is the first column, how the rest of the columns is arranged doesn't matter, so I would like it to be like this (without calling any names of the other columns that are spread, or using order index numbers):

  Group Final.Result Fast Test Verification
1     A            7 <NA>    8            1
2     B           NA    9 <NA>            7
3     C           17   12 <NA>           10
4     D           11 <NA>   50         <NA>

I saw some answers that indicated you can reverse the order of the spreaded columns, or turn off the ordering of spread, but that doesn't make sure that Final.Result is always the first column of the spread levels.

I hope I am making myself clear, it's a little complicated to explain. If someone needs extra info I will be happy to explain more!


Solution

  • spread creates columns in the order of the key column's factor levels. Within the tidyverse, forcats::fct_relevel is a convenience function for rearranging factor levels. The default is that the level(s) you specify will be moved to the front.

    library(dplyr)
    library(tidyr)
    
    ...
    
    levels(df$Result_Type)
    #> [1] "Fast"         "Final.Result" "Test"         "Verification"
    

    Calling fct_relevel will put "Final.Result" as the first level, keeping the rest of the levels in their previous order.

    reordered <- df %>%
      mutate(Result_Type = forcats::fct_relevel(Result_Type, "Final.Result"))
    
    levels(reordered$Result_Type)
    #> [1] "Final.Result" "Fast"         "Test"         "Verification"
    

    Adding that into your pipeline puts Final.Result as the first column after spreading.

    df %>%
      mutate(Result_Type = forcats::fct_relevel(Result_Type, "Final.Result")) %>%
      spread(key = Result_Type, value = Result)
    #>   Group Final.Result Fast Test Verification
    #> 1     A            7 <NA>    8            1
    #> 2     B           NA    9 <NA>            7
    #> 3     C           17   12 <NA>           10
    #> 4     D           11 <NA>   50         <NA>
    

    Created on 2018-12-14 by the reprex package (v0.2.1)