Possible duplicate: How to sort a dataframe by column(s)?
I have a data frame in different orders (in terms of columns).
ID_REF VALUE ABS_CALL DETECTION.P.VALUE
1 10071_s_at 3473.60000 P/present 0.000219000
2 1053_at 643.20000 P/present 0.000673000
3 117_at 564.00000 M/Marginal 0.000322000
4 1255_g_at 9.40000 A/absent 0.602006000
5 1294_at 845.60000 P/present 0.000468000
6 1320_at 94.30000 A/absent 0.204022000
Now below the order of columns is changed
VALUE ID_REF ABS_CALL DETECTION P-VALUE
1 3473.6 10071_s_at P/present 0.000219
2 643.2 1053_at P/present 0.000673
3 564 117_at M/marginal 0.000322
4 9.4 1255_g_at A/absent 0.602006
5 845.6 1294_at P/present 0.000468
6 94.3 1320_at A/absent 0.204022
Again it is changed
DETECTION P-VALUE VALUE ID_REF ABS_CALL
1 0.000219 3473.6 10071_s_at P
2 0.000673 643.2 1053_at P
3 0.000322 564 117_at M
4 0.602006 9.4 1255_g_at A
5 0.000468 845.6 1294_at P
6 0.204022 94.3 1320_at A
Here I have the same data frame in different column orders. I don't know the order of the data frame, so I want to put the data in the below format:
ID_REF VALUE ABS_CALL DETECTION.P.VALUE
1 10071_s_at 3473.60000 P/present 0.000219000
2 1053_at 643.20000 P/present 0.000673000
3 117_at 564.00000 M/Marginal 0.000322000
4 1255_g_at 9.40000 A/absent 0.602006000
5 1294_at 845.60000 P/present 0.000468000
6 1320_at 94.30000 A/absent 0.204022000
Here i need to check if there is a substring of _at
in any column then put that as the first column. If any column has values greater than 1 then put that column as the second column of data frame. If any column has levels P
,A
,M
or present
,absent
,marginal
then put that as the third column and finally any column with values less than 1 will be the last. Can anybody please tell me how to do this in R efficiently?
Note: Column names are not permanent, those can be anything (different names).
This isn't elegant by any means, but you could write a function that looks at only a single row and search for each of your criteria:
dat1 <- read.table(header = TRUE, check.names = FALSE,
text="ID_REF VALUE ABS_CALL DETECTION.P.VALUE
1 10071_s_at 3473.60000 P/present 0.000219000
2 1053_at 643.20000 P/present 0.000673000
3 117_at 564.00000 M/Marginal 0.000322000
4 1255_g_at 9.40000 A/absent 0.602006000
5 1294_at 845.60000 P/present 0.000468000
6 1320_at 94.30000 A/absent 0.204022000")
dat2 <- read.table(header = TRUE, check.names = FALSE,
text="VALUE ID_REF ABS_CALL 'DETECTION P-VALUE'
1 3473.6 10071_s_at P/present 0.000219
2 643.2 1053_at P/present 0.000673
3 564 117_at M/marginal 0.000322
4 9.4 1255_g_at A/absent 0.602006
5 845.6 1294_at P/present 0.000468
6 94.3 1320_at A/absent 0.204022")
dat3 <- read.table(header = TRUE, check.names = FALSE,
text=" 'DETECTION P-VALUE' VALUE ID_REF ABS_CALL
1 0.000219 3473.6 10071_s_at P
2 0.000673 643.2 1053_at P
3 0.000322 564 117_at M
4 0.602006 9.4 1255_g_at A
5 0.000468 845.6 1294_at P
6 0.204022 94.3 1320_at A")
f <- function(data) {
d1 <- data[1, , drop = FALSE]
## from row 1, separate out numerics, p-value is < 1
## and the other we assume is value
nums <- d1[, nn <- sapply(d1, is.numeric)]
p <- names(nums[, nums < 1, drop = FALSE])
val <- setdiff(names(nums), p)
## take all the other columns, find `_at$` as id
## and assume the other column is `abs_call`
ch <- d1[, !nn, drop = FALSE]
id <- names(ch[, grepl('_at$', as.character(unlist(ch))), drop = FALSE])
abs <- setdiff(names(ch), id)
## order by the name found
data[, c(id, val, abs, p)]
}
lapply(list(dat1, dat2, dat2), f)
# [[1]]
# ID_REF VALUE ABS_CALL DETECTION.P.VALUE
# 1 10071_s_at 3473.6 P/present 0.000219
# 2 1053_at 643.2 P/present 0.000673
# 3 117_at 564.0 M/Marginal 0.000322
# 4 1255_g_at 9.4 A/absent 0.602006
# 5 1294_at 845.6 P/present 0.000468
# 6 1320_at 94.3 A/absent 0.204022
#
# [[2]]
# ID_REF VALUE ABS_CALL DETECTION P-VALUE
# 1 10071_s_at 3473.6 P/present 0.000219
# 2 1053_at 643.2 P/present 0.000673
# 3 117_at 564.0 M/marginal 0.000322
# 4 1255_g_at 9.4 A/absent 0.602006
# 5 1294_at 845.6 P/present 0.000468
# 6 1320_at 94.3 A/absent 0.204022
#
# [[3]]
# ID_REF VALUE ABS_CALL DETECTION P-VALUE
# 1 10071_s_at 3473.6 P/present 0.000219
# 2 1053_at 643.2 P/present 0.000673
# 3 117_at 564.0 M/marginal 0.000322
# 4 1255_g_at 9.4 A/absent 0.602006
# 5 1294_at 845.6 P/present 0.000468
# 6 1320_at 94.3 A/absent 0.204022