Search code examples
rsubsetr-factor

Select row by level of a factor


I have a data frame, df2, containing observations grouped by a ID factor that I would like to subset. I have used another function to identify which rows within each factor group that I want to select. This is shown below in df:

df <- data.frame(ID = c("A","B","C"),
                 pos = c(1,3,2))
df2 <- data.frame(ID = c(rep("A",5), rep("B",5), rep("C",5)),
                  obs = c(1:15))

In df, pos corresponds to the index of the row that I want to select within the factor level mentioned in ID, not in the whole dataframe df2.I'm looking for a way to select the rows for each ID according to the right index (so their row number within the level of each factor of df2).

So, in this example, I want to select the first value in df2 with ID == 'A', the third value in df2 with ID == 'B' and the second value in df2 with ID == 'C'.

This would then give me:

df3 <- data.frame(ID = c("A", "B", "C"),
                  obs = c(1, 8, 12))

Solution

  • Here's the base R solution:

    df2$pos <- ave(df2$obs, df2$ID, FUN=seq_along)
    merge(df, df2)
      ID pos obs
    1  A   1   1
    2  B   3   8
    3  C   2  12
    

    If df2 is sorted by ID, you can just do df2$pos <- sequence(table(df2$ID)) for the first line.