I'm working in R. I have a dataframe, df
that looks like this:
> str(exp)
'data.frame': 691200 obs. of 19 variables:
$ groupname: Factor w/ 8 levels "rowA","rowB",..: 1 1 1 1 1 1 1 1 1 1 ...
$ location : Factor w/ 96 levels "c1","c10","c11",..: 1 2 3 4 12 23 34 45 56 67 ...
$ starttime: num 0 0 0 0 0 0 0 0 0 0 ...
$ inadist : num 0 0.2 0 0.2 0.6 0 0 0 0 0 ...
$ smldist : num 0 2.1 0 1.8 1.2 0 0 0 0 3.3 ...
$ lardist : num 0 0 0 0 0 0 0 0 0 1.3 ...
$ fPhase : Factor w/ 2 levels "Light","Dark": 2 2 2 2 2 2 2 2 2 2 ...
$ fCycle : Factor w/ 6 levels "predark","Cycle 1",..: 1 1 1 1 1 1 1 1 1 1 ...
I'd like to add another column, timepoint
, that gives the starttime
relative to the beginning of the fCycle
it is in. So starttime=1801
would be timepoint=1
for fCycle='Cycle 1'
.
What is the best way to create df$timepoint
?
ETA toy dataset:
starttime fCycle timepoint
1 1 1
2 1 2
3 1 3
4 1 4
5 2 1
6 2 2
7 2 3
8 2 4
9 3 1
10 3 2
11 3 3
12 4 1
13 4 2
14 4 3
15 5 1
16 5 2
17 6 1
18 6 2
19 6 3
20 6 4
You can combine rle
with sequence
. Here is some sample code. Is the output what you were looking for?
require(plyr)
mydf = data.frame(
starttime = 1:20,
fCycle = c(rep(1:3, each = 4), rep(4:5, each = 3), rep(6, 2))
)
# sort data in increasing order of cycle and starttime
mydf = arrange(mydf, fCycle, starttime)
mydf = transform(mydf, timepoint = sequence(rle(fCycle)$lengths))
NOTE: In the light of the fact that there could be identical starttimes within the same fCycle, here is an alternate approach using rank
and ddply
# treat same starttimes in an fcycle identically
ddply(mydf, .(fCycle), transform, timepoint = rank(starttime, ties = 'min'))
# treat same starttimes in an fcycle using average
ddply(mydf, .(fCycle), transform, timepoint = rank(starttime, ties = 'average'))