Search code examples
rtapply

How to determine absolute row number using which.min and tapply?


My dataset, named ds, is a matrix with three columns and 4000+ observations. The three columns in ds are:

name v2 f1
  1. name is character
  2. v2 is numeric
  3. f1 is factor with 54 levels

I want to find the position of the min for v2 for factor x. I tried to use tapply as follows

tapply(ds$v2, ds$f1 == x, which.min)

The answer I get is something like this:

FALSE  TRUE 
 2821    19

I presumed that 19 is the absolute position in my dataset and if I want to find the name of the observation all I need to do is

ds[19, 1]

But apparently that is incorrect. I have understood that 19 corresponds to the relative position i.e. it is the 19th observation for factor x.

So my question is: How can I find the absolute position for min value of factor x?


Solution

  • tapply will apply the function on each unique value of the second argument so you shouldn't use ds$f1 == x and probably just ds$f1 so it looks like:

    tapply(ds$v2, ds$f1 == x, which.min)
    

    Here is an example with the iris data set that comes with R:

    tapply(iris$Sepal.Length, iris$Species, which.min)
    

    EDIT:

    However, as you noted, this will give you the position within the subsetted data and not the absolute position.

    I don't think it's possible to get the absolute value from tapply because you are working on a single vector. If you want to work with multiple columns at once, you can use this kind of approach:

    d <- split(iris, iris$Species)
    row_positions <- sapply(d, function(x) rownames(x[which.min(x$Sepal.Length), ]))
    iris[row_positions, ]