I have a data frame, df2
, containing observations grouped by a ID factor
that I would like to subset. I have used another function to identify which rows within each factor group that I want to select. This is shown below in df
:
df <- data.frame(ID = c("A","B","C"),
pos = c(1,3,2))
df2 <- data.frame(ID = c(rep("A",5), rep("B",5), rep("C",5)),
obs = c(1:15))
In df
, pos
corresponds to the index of the row that I want to select within the factor level mentioned in ID
, not in the whole dataframe df2
.I'm looking for a way to select the rows for each ID
according to the right index (so their row number within the level of each factor of df2
).
So, in this example, I want to select the first value in df2
with ID == 'A'
, the third value in df2
with ID == 'B'
and the second value in df2
with ID == 'C'
.
This would then give me:
df3 <- data.frame(ID = c("A", "B", "C"),
obs = c(1, 8, 12))
Here's the base R solution:
df2$pos <- ave(df2$obs, df2$ID, FUN=seq_along)
merge(df, df2)
ID pos obs
1 A 1 1
2 B 3 8
3 C 2 12
If df2
is sorted by ID
, you can just do df2$pos <- sequence(table(df2$ID))
for the first line.