I'm fairly new to R so I hope somebody can help me. An output table in one of my scripts is the averagetable
below showing different proportions of the event Standing
in three different clusters:
> print(averagetable)
Group.1 Standing
1 cluster1 0.5642857
2 cluster2 0.7795848
3 cluster3 0.7922980
Note that R can assign different cluster names (cluster1
, cluster2
or cluster3
) to the values on averagetable$Standing
each time I'm running the scrip. Another output can be:
> print(averagetable)
Group.1 Standing
1 cluster1 0.7795848
2 cluster2 0.5642857
3 cluster3 0.7922980
On the other hand, my script produces the tableresults
dataframe. Please find a head()
sample below:
> head(tableresults)
ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1 19 21 28 cluster3
2 20 14 24 cluster3
3 34 35 49 cluster3
4 18 5 19 cluster2
5 23 27 35 cluster3
6 33 20 39 cluster3
My question is fairly simple. I would like to transform the data in tableresults
changing the string in the column winning_cluster
based on three rules:
1) Write Standing
in tableresults$wining_cluster
replacing it by the cluster name having the highest Standing
value in averagetable
.
2) Write Moving/Feeding
in tableresults$wining_cluster
replacing it by the cluster name having the second highest Standing
value in averagetable
.
3) Write Feeding/Moving
in tableresults$wining_cluster
replacing it by the cluster name having the third highest Standing
value in averagetable
.
In other words, this is the output desired:
> head(tableresults_output)
ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1 19 21 28 Standing
2 20 14 24 Standing
3 34 35 49 Standing
4 18 5 19 Moving/Feeding
5 23 27 35 Standing
6 33 20 39 Standing
Note that it is very important to have a value-based, hierarchical component that will assign conditions 1) 2) or 3) depending on averagetable
values. This is not solved by using:
averagetable$classification <- factor(x = as.character(sort(averagetable$Standing)),
labels = c('Feeding/Moving', 'Moving/Feeding','Standing'))
With this command Standing
will be always linked to cluster1
, Moving/Feeding
to cluster2
and Feeding/Moving
to cluster3
and that is not necessarily true when averagetable
is regenerated.
Anyways, any help is appreciated and I hope my question was interesting enough for the forum.
Here's a stab:
tableresults <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
ACTIVITY_X ACTIVITY_Y ACTIVITY_Z winning_cluster
1 19 21 28 cluster3
2 20 14 24 cluster3
3 34 35 49 cluster3
4 18 5 19 cluster2
5 23 27 35 cluster3
6 33 20 39 cluster3")
averagetable <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
Group.1 Standing
1 cluster1 0.5642857
2 cluster2 0.7795848
3 cluster3 0.7922980")
averagetable$x <- c("Standing", "Moving/Feeding", "Feeding/Moving")[ rank(-averagetable$Standing) ]
merge(tableresults, averagetable[,c(1,3)], by.x="winning_cluster", by.y="Group.1")
# winning_cluster ACTIVITY_X ACTIVITY_Y ACTIVITY_Z x
# 1 cluster2 18 5 19 Moving/Feeding
# 2 cluster3 19 21 28 Standing
# 3 cluster3 20 14 24 Standing
# 4 cluster3 34 35 49 Standing
# 5 cluster3 23 27 35 Standing
# 6 cluster3 33 20 39 Standing