I have a data frame and I want to return a ranking for each Category
based on PCC
.
> head(newdf)
ItemId Category PCC
1 5063660193 Go to Gifts 2
2 24154563660193 Go to Gifts 1
2.1 24154563660193 All Gifts 1
3 26390063660193 Go to Gifts 3
3.1 26390063660193 All Gifts 3
4 18700100 Go to Gifts 1
I've initially though to do it using the sqldf
package, but unfortunately a dependence (tcltk
) is not available for R version 3.0.2.
With sqldf
a call similar to the following should do the job:
# ranking by category
rank <- sqldf("select
nf.ItemId,
nf.Category,
nf.PCC,
rank() over(Partition by nf.Category order by nf.PCC, nf.ItemId, nf.Category) as Ranks
from
newdf as nf
order by
nf.Category,
nf.Ranks")
Do you know any alternative I can use?
These are only a handful of the different ways to do this:
dat <- read.table(text = " ItemId Category PCC
5063660193 'Go to Gifts' 2
24154563660193 'Go to Gifts' 1
24154563660193 'All Gifts' 1
26390063660193 'Go to Gifts' 3
26390063660193 'All Gifts' 3
18700100 'Go to Gifts' 1",header = TRUE,sep = "")
library(plyr)
ddply(dat,.(Category),transform, val = rank(PCC))
library(dplyr)
mutate(group_by(dat,Category),val = rank(PCC))
library(data.table)
dat1 <- data.table(dat)
setkey(dat1,Category)
dat1[,val := rank(PCC),by = key(dat1)]
Also, I am able to load sqldf on R 3.0.2 just fine, so I'm not sure what your problem there is.
This uses the default behavior of rank
. See ?rank
and the ties.method
argument to customize it to your exact needs.