I would like to sort the columns of a large dataframe (around 14000 variables) based on a specific rationale.
The column names have the following structure (Condition_Sleepstage_Parameter_Electrode_Nightpart):
[1] "Adapt_N2_negLengthLoc_C3_firstHour" "Adapt_N3_negLengthLoc_C3_firstHour"
[3] "Adapt_NREM_negLengthLoc_C3_firstHour" "Book_N2_negLengthLoc_C3_firstHour"
[5] "Book_N3_negLengthLoc_C3_firstHour" "Book_NREM_negLengthLoc_C3_firstHour"
R has the columns ordered in a purely alphabetical structure but I would like to have them in a logical order based on the following system:
First, variables should be presented in blocks per parameter. (order: "negLengthLoc", "posLength", "wholeLength", "negPeak", "nbnegPeaks", "initialMeannegSlope", "finalMeannegSlope", "initialMaxnegslope", "finalMaxnegslope", "posPeak", "nbposPeaks", "initialMeannposSlope", "finalMeanposSlope", "initialMaxposSlope", "PeaktoPeak", "Number", "Density")
Within these blocks, the highest order hierarchy should be due to Condition. (order: "Adapt", "NoFilter", "Filter", "Book").
Thereafter, the next hierarchy should be defined by Electrode. (order: "F3", "Fz", "F4", "C3", "Cz", "C4", "P3", "Pz", "P4", "O1", "O2").
Thereafter by Nightpart (order:"firstHour", "firstQuarter", "secondQuarter", "thirdQuarter", "fourthQuarter", "wholeNight") and lastly by Sleepstage ("order: "N2", "N3", "NREM").
The resulting order should look like:
[1] "Adapt_N2_negLengthLoc_F3_firstHour" "Adapt_N3_negLengthLoc_F3_firstHour"
[3] "Adapt_NREM_negLengthLoc_F3_firstHour" "Adapt_N2_negLengthLoc_F3_firstQuarter"
[5] "Adapt_N3_negLengthLoc_F3_firstQuarter" "Adapt_NREM_negLengthLoc_F3_firstQuarter"
[7] "Adapt_N2_negLengthLoc_F3_secondQuarter" "Adapt_N3_negLengthLoc_F3_secondQuarter"
[9] "Adapt_NREM_negLengthLoc_F3_secondQuarter" "Adapt_N2_negLengthLoc_F3_thirdQuarter"
[11] "Adapt_N3_negLengthLoc_F3_thirdQuarter" "Adapt_NREM_negLengthLoc_F3_thirdQuarter"
[13] "Adapt_N2_negLengthLoc_F3_fourthQuarter" "Adapt_N3_negLengthLoc_F3_fourthQuarter"
[15] "Adapt_NREM_negLengthLoc_F3_fourthQuarter" "Adapt_N2_negLengthLoc_F3_wholeNight"
[17] "Adapt_N3_negLengthLoc_F3_wholeNight" "Adapt_NREM_negLengthLoc_F3_wholeNight"
[19] "Adapt_N2_negLengthLoc_Fz_firstHour" "Adapt_N3_negLengthLoc_Fz_firstHour"
...
I hope that someone can help me, if there are any further questions then I am of course happy to provide more information!
Thanks in advance!
Illustrating with the mtcars
data, one can reorder the columns in a data frame by creating a vector with the the desired order, and using this in the column specification with the [
form of the extract operator.
First, we'll use colnames()
to extract the original order of the columns and print them
theNames <- colnames(mtcars)
theNames
> theNames
[1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am" "gear" "carb"
Next, we'll move all of the integer columns to the left side of the data frame by creating a reorderedNames
vector and using it with [
.
reorderedNames <- c("cyl" , "vs" , "am" , "gear" ,"carb","disp" ,
"drat", "wt" , "qsec", "mpg")
mtcars[,reorderedNames]
...and the first few rows of output:
> mtcars[,reorderedNames]
cyl vs am gear carb disp drat wt qsec mpg
Mazda RX4 6 0 1 4 4 160.0 3.90 2.620 16.46 21.0
Mazda RX4 Wag 6 0 1 4 4 160.0 3.90 2.875 17.02 21.0
Datsun 710 4 1 1 4 1 108.0 3.85 2.320 18.61 22.8
Hornet 4 Drive 6 1 0 3 1 258.0 3.08 3.215 19.44 21.4
Hornet Sportabout 8 0 0 3 2 360.0 3.15 3.440 17.02 18.7
Valiant 6 1 0 3 1 225.0 2.76 3.460 20.22 18.1
Duster 360 8 0 0 3 4 360.0 3.21 3.570 15.84 14.3
Merc 240D 4 1 0 4 2 146.7 3.69 3.190 20.00 24.4
In the OP, the question references a data frame with a large number of columns. In order to extend this process to automate the sorting of columns, there are at least two major approaches.
pivot_longer()
to split the column names into the required grouping variables.We'll use data from the OP to illustrate approach 1.
columnName <- c("Adapt_N2_negLengthLoc_C3_firstHour","Adapt_N3_negLengthLoc_C3_firstHour",
"Adapt_NREM_negLengthLoc_C3_firstHour","Book_N2_negLengthLoc_C3_firstHour",
"Book_N3_negLengthLoc_C3_firstHour","Book_NREM_negLengthLoc_C3_firstHour")
splitCols <- strsplit(columnName,"_")
results <- lapply(splitCols,function(x){
parameter <- x[3]
condition <- x[1]
electrode <- x[4]
nightpart <- x[5]
sleepstage <- x[2]
data.frame(parameter,condition,electrode,nightpart,sleepstage)
})
colsData <- do.call(rbind,results)
# add original column names back into data
colsData <- cbind(columnName,colsData)
# convert to factors, specifying the factor order for sorting
conditionOrder <- c("Adapt", "NoFilter", "Filter", "Book")
parameterOrder <- c("negLengthLoc", "posLength", "wholeLength", "negPeak", "nbnegPeaks",
"initialMeannegSlope", "finalMeannegSlope", "initialMaxnegslope",
"finalMaxnegslope", "posPeak", "nbposPeaks", "initialMeannposSlope",
"finalMeanposSlope", "initialMaxposSlope", "PeaktoPeak", "Number", "Density")
electrodeOrder <- c("F3", "Fz", "F4", "C3", "Cz", "C4", "P3", "Pz", "P4", "O1", "O2")
nightpartOrder <- c("firstHour", "firstQuarter", "secondQuarter", "thirdQuarter", "fourthQuarter", "wholeNight")
sleepstageOrder <- c("N2", "N3", "NREM")
colsData$condition <- factor(colsData$condition,levels = conditionOrder,ordered = TRUE)
colsData$parameter <- factor(colsData$parameter,levels = parameterOrder,ordered = TRUE)
colsData$electrode <- factor(colsData$electrode,levels = electrodeOrder,ordered = TRUE)
colsData$nightpart <- factor(colsData$nightpart,levels = nightpartOrder,ordered = TRUE)
colsData$sleepstage <- factor(colsData$sleepstage,levels = sleepstageOrder,ordered = TRUE)
# finally, sort by factors & create a vector for column number
library(dplyr)
colsData <- arrange(colsData,condition,parameter,electrode,nightpart,sleepstage)
colsData$colId <- 1:nrow(colsData)
colsData
...and the output:
> colsData
columnName parameter condition electrode nightpart
1 Adapt_N2_negLengthLoc_C3_firstHour negLengthLoc Adapt C3 firstHour
2 Adapt_N3_negLengthLoc_C3_firstHour negLengthLoc Adapt C3 firstHour
3 Adapt_NREM_negLengthLoc_C3_firstHour negLengthLoc Adapt C3 firstHour
4 Book_N2_negLengthLoc_C3_firstHour negLengthLoc Book C3 firstHour
5 Book_N3_negLengthLoc_C3_firstHour negLengthLoc Book C3 firstHour
6 Book_NREM_negLengthLoc_C3_firstHour negLengthLoc Book C3 firstHour
sleepstage colId
1 N2 1
2 N3 2
3 NREM 3
4 N2 4
5 N3 5
6 NREM 6
>
At this point we can use colsData$columnName
to order the original data frame.