I have a data frame looking like this that I'm preparing for a ggplot:
txt <- "v1 v2 v3
'Strongly agree' 83.1 var1
'Agree' 14.9 var1
'Disagree' 1.5 var1
'Strongly disagree' 0.6 var1
'Strongly agree' 11.8 var2
'Agree' 36.5 var2
'Disagree' 17.7 var2
'Strongly disagree' 43.8 var2
'Strongly agree' 19.6 var3
'Agree' 12 var3
'Disagree' 31.6 var3
'Strongly disagree' 36.8 var3"
mydata <- read.table(textConnection(txt), sep = " ", header = TRUE)
My question is: How to order the levels in mydata$v3
based on both the value in mydta$v2
and the levels in mydata$v1
?
An example: If I for instance would like to order the levels in mydata$v3
based on the highest value in mydata$v2
within the level 'Strongly agree' in mydata$v1
the order i would get would be: var1
, var3
, var2
because the values in mydata$v2
is 83.1, 19.6, 11.8.
Another example: if I for instance would like to order the levels in mydata$v3
based on the sum of values in mydata$v2
within the levels 'Strongly agree' and 'Agree' in mydata$v1
the order I would get would be: var1
, var2
, var3
because the values in mydata$v2
is (83.1+14.9)=98, (11.8+36.5)=48.3, (19.6+12)=31.6
I have no idea how to approach this myself. And also, I deal with a lot of frames like this so the code have to go into a function
EDIT:
In both examples, the result I'm going for is the original data.frame ONLY with the order in levels in mydata$v3 changed.
So in example 1 I have:
v1 v2 v3
1 Strongly agree 83.1 var1
2 Agree 14.9 var1
3 Disagree 1.5 var1
4 Strongly disagree 0.6 var1
5 Strongly agree 11.8 var2
6 Agree 36.5 var2
7 Disagree 17.7 var2
8 Strongly disagree 43.8 var2
9 Strongly agree 19.6 var3
10 Agree 12.0 var3
11 Disagree 31.6 var3
12 Strongly disagree 36.8 var3
levels(mydata$v3)
[1] "var1" "var2" "var3"
but what I want to end out with is this.
v1 v2 v3
1 Strongly agree 83.1 var1
2 Agree 14.9 var1
3 Disagree 1.5 var1
4 Strongly disagree 0.6 var1
5 Strongly agree 11.8 var2
6 Agree 36.5 var2
7 Disagree 17.7 var2
8 Strongly disagree 43.8 var2
9 Strongly agree 19.6 var3
10 Agree 12.0 var3
11 Disagree 31.6 var3
12 Strongly disagree 36.8 var3
levels(mydata$v3)
[1] "var1" "var3" "var2"
In example two I have:
v1 v2 v3
1 Strongly agree 83.1 var1
2 Agree 14.9 var1
3 Disagree 1.5 var1
4 Strongly disagree 0.6 var1
5 Strongly agree 11.8 var2
6 Agree 36.5 var2
7 Disagree 17.7 var2
8 Strongly disagree 43.8 var2
9 Strongly agree 19.6 var3
10 Agree 12.0 var3
11 Disagree 31.6 var3
12 Strongly disagree 36.8 var3
levels(mydata$v3)
[1] "var1" "var2" "var3"
but want:
v1 v2 v3
1 Strongly agree 83.1 var1
2 Agree 14.9 var1
3 Disagree 1.5 var1
4 Strongly disagree 0.6 var1
5 Strongly agree 11.8 var2
6 Agree 36.5 var2
7 Disagree 17.7 var2
8 Strongly disagree 43.8 var2
9 Strongly agree 19.6 var3
10 Agree 12.0 var3
11 Disagree 31.6 var3
12 Strongly disagree 36.8 var3
levels(mydata$v3)
[1] "var1" "var2" "var3"
Notice, that in example two what I have and what i want is identical, but I have a lot of data.frames in which this will not be the case.
What I'm looking for I guess, is a complex version of
factor(maydata$v3, levels(mydata$v3)[EXAMPLE1: order after value in v2 within 1 level in v1 /EXAMPLE2: order after sum of value within 2 levels in v1])
Here's a solution with aggregate
:
f <- function(mydata, v1.val) {
# Value or sum of v2 within the selected rows
sums <- aggregate(v2 ~ v3, data=mydata[mydata$v1 %in% v1.val,], FUN=sum)
# Decreasing order of the sum of v2 values, or the only v2 value, for each level of v3
ord <- order(sums$v2, decreasing=TRUE)
# Build a new factor with the proper levels and assign it to v3
fac <- factor(mydata$v3, levels=sums$v3[ord])
mydata$v3 <- fac
return(mydata)
}
The data frames look as above, but the factor levels are as desired:
> f(mydata, 'Strongly agree')$v3
[1] var1 var1 var1 var1 var2 var2 var2 var2 var3 var3 var3 var3
Levels: var1 var3 var2
> f(mydata, c('Strongly agree', 'Agree'))$v3
[1] var1 var1 var1 var1 var2 var2 var2 var2 var3 var3 var3 var3
Levels: var1 var2 var3