Search code examples
rcolumnsorting

How to order columns by numbers in R?


I have data.frame with the column names shown in the example. V1.1 = test series 1, day 1, K1-K3 are three unique samples. I want to order my columns by days, so that the order would be V1.1, V1.2, V1.14, V1.21. I think that this is no big problem, but I can't find the correct answer for my problem in the forum.

  A tibble: 6 x 168
      genus V1.1.K1 V1.1.K2 V1.1.K3 V1.14.K16 V1.14.K17 V1.14.K18 V1.2.K4 V1.2.K5 V1.2.K6 V1.21.K19 V1.21.K20 V1.21.K21
      <chr>   <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>   <dbl>   <dbl>   <dbl>     <dbl>     <dbl>     <dbl>
    1 A17        58       7      78         0         0         0       0       0       0         0         0         0
    2 Acan~       0       0       0         0         0         0       0       0       0         0         0         0
    3 Acet~       8       0       5       155       256       341       0      28      26       188       142        83
    4 Achr~       0       0       0         0         0         0       0       0       0         0         0         0
    5 Acid~       0       0       7         0         0         0       0      83       6        43        13         0
    6 Acid~       0       0       0         0         0         0       0       0       0         0         0         0

First, I tried to solve the problem with

select(1:3, 7:9, 4:5, 11:13)

But then it is not reproducible to other data.frames, I am working with, because there the amount of colums or the days are different.

Or is there an argument to add to my data-import, which is done by

read_tsv()

? You would help me a lot :) Kathrin


Solution

  • Since you want them ordered by the numbers, disregarding the letters (or lexicographic sorts), then here's a way. Since you're using select, I'll infer you have %>% and others.

    library(magrittr) # %>%, not needed if you have dplyr loaded
    library(stringr)  # str_extract_all
    ord <- str_extract_all(names(dat), "\\d+") %>%
      lapply(., `length<-`, max(lengths(.))) %>%
      lapply(., as.integer) %>%
      do.call(rbind.data.frame, .) %>%
      do.call(function(...) order(..., na.last = FALSE), .)
    
    ord
    #  [1]  1  2  3  4  8  9 10  5  6  7 11 12 13
    dat[,ord]      # or select(dat, all_of(ord))
    #   genus V1.1.K1 V1.1.K2 V1.1.K3 V1.2.K4 V1.2.K5 V1.2.K6 V1.14.K16 V1.14.K17 V1.14.K18 V1.21.K19 V1.21.K20 V1.21.K21
    # 1   A17      58       7      78       0       0       0         0         0         0         0         0         0
    # 2 Acan~       0       0       0       0       0       0         0         0         0         0         0         0
    # 3 Acet~       8       0       5       0      28      26       155       256       341       188       142        83
    # 4 Achr~       0       0       0       0       0       0         0         0         0         0         0         0
    # 5 Acid~       0       0       7       0      83       6         0         0         0        43        13         0
    # 6 Acid~       0       0       0       0       0       0         0         0         0         0         0         0
    

    The na.last=FALSE forces (in this case) genus to be first. If you have various names with and without numbers, you may get inconsistent results, in which case you may need to subset the column names you use with this.


    Data

    dat <- structure(list(genus = c("A17", "Acan~", "Acet~", "Achr~", "Acid~", "Acid~"), V1.1.K1 = c(58L, 0L, 8L, 0L, 0L, 0L), V1.1.K2 = c(7L, 0L, 0L, 0L, 0L, 0L), V1.1.K3 = c(78L, 0L, 5L, 0L, 7L, 0L), V1.14.K16 = c(0L, 0L, 155L, 0L, 0L, 0L), V1.14.K17 = c(0L, 0L, 256L, 0L, 0L, 0L), V1.14.K18 = c(0L, 0L, 341L, 0L, 0L, 0L), V1.2.K4 = c(0L, 0L, 0L, 0L, 0L, 0L), V1.2.K5 = c(0L, 0L, 28L, 0L, 83L, 0L), V1.2.K6 = c(0L, 0L, 26L, 0L, 6L, 0L), V1.21.K19 = c(0L, 0L, 188L, 0L, 43L, 0L), V1.21.K20 = c(0L, 0L, 142L, 0L, 13L, 0L), V1.21.K21 = c(0L, 0L, 83L, 0L, 0L, 0L)), class = "data.frame", row.names = c("1", "2", "3", "4", "5", "6"))