Can you order the rows of a dataframe when the names are duplicated? I have a dataset that is very large and I cannot rename all the duplicated variables, it looks something like this
stat | value |
---|---|
c | 3 |
d | 7 |
a | 9 |
b | 5 |
b | 1 |
c | 5 |
e | 8 |
f | 5 |
And I have to order the rows, alongside with its corresponding value, before exporting as an excel file (ie I need an excel with an exact format).
stat | value |
---|---|
a | 9 |
b | 1 |
c | 3 |
d | 7 |
b | 5 |
c | 5 |
e | 8 |
f | 5 |
Edit: Sorry seems like there is some confusion on the order for "stat" in the second example. For the purpose of this question, please assume there is no specific logic for the order of "stat" in the second example, except that it has to be ordered exactly like so.
For context, I need to export the dataframe formatted in an very specific order, there's no particular reason for it except it is company standard procedure.
Data:
xyzzy = structure(list(stat = c("c", "d", "a", "b", "b", "c", "e", "f"
), value = c(3L, 7L, 9L, 5L, 1L, 5L, 8L, 5L)), class = "data.frame", row.names = c(NA,
-8L))
the ordering of stat
is clear to me but I don't understand if there is any certain order of value
.
Here is a solution using data.table
:
library(data.table)
# n samples
n <- 400
set.seed(0123)
# create table
df <- data.table::data.table(
stat = sample(letters[1:5], size = n, replace = TRUE),
value = sample(0:100, size = n, replace = TRUE)
)
# set initial order
data.table::setorder(df, stat, -value)
# get position by stat
df[, i := seq_len(.N), by = stat]
# now order again
data.table::setorder(df, i, stat)
# remove the index
df[, i := NULL]
> head(df, 10)
stat value
<char> <int>
1: a 100
2: b 100
3: c 100
4: d 100
5: e 100
6: a 99
7: b 100
8: c 99
9: d 98
10: e 99
> tail(df, 10)
stat value
<char> <int>
1: a 5
2: b 3
3: e 2
4: a 3
5: b 2
6: e 1
7: a 1
8: b 1
9: a 0
10: b 1
However, if value should NOT be ordered in any specific way. Don't order by it originally
# set initial order
data.table::setorder(df, stat, -value)
Should instead be:
# set initial order
data.table::setorder(df, stat)
Edit
If you want to order stat non-alphabetically, this will get you that:
# n samples
n <- 400
set.seed(0123)
# create table
df <- data.table::data.table(
stat = sample(letters[1:5], size = n, replace = TRUE),
value = sample(0:100, size = n, replace = TRUE)
)
# add custom order
custom_order <- data.table::data.table(
stat = sample(letters[1:5]),
order = 1:5
)
# merge to table
df <- data.table::merge.data.table(
df,
custom_order,
by = "stat"
)
# set initial order
data.table::setorder(df, order, -value)
# get position by stat
df[, i := seq_len(.N), by = order]
# now order again
data.table::setorder(df, i, order)
# remove the index & order
df[, i := NULL]
df[, order := NULL]
Where
> custom_order
stat order
<char> <int>
1: a 1
2: d 2
3: c 3
4: e 4
5: b 5
> head(df)
stat value
<char> <int>
1: a 100
2: d 100
3: c 100
4: e 100
5: b 100
6: a 99
> tail(df)
stat value
<char> <int>
1: e 1
2: b 2
3: a 1
4: b 1
5: a 0
6: b 1