Search code examples
raggregatetapply

group by a dataframe and get a row of specific index within each group in r


I have a df like

ProjectID Dist
  1        x
  1        y
  2        z
  2        x
  2        h
  3        k
  ....     ....

and a vector of indices of lengthunique(df$ProjectID) like

  2        
  3        
  1        
  ....    

I would like to get Dist by ProjectID whose index is the element vector corresponding to project ID. So the result I want looks like

ProjectID Dist
  1        y
  2        h
  3        k
  ....     ....

I tried

aggregate(XRKL ~ ID, FUN=..?, data=df)

but I'm not sure where I can put the vector of indices. Is there a way to get the right result from dply ftns, tapply, or aggregate? Or do I need to make a function of my own? Thank you.


Solution

  • You can add the indices in the dataframe itself and then select that row from each group.

    inds <- c(2, 3, 1)
    
    df %>%
      mutate(inds = inds[match(ProjectID, unique(ProjectID))]) %>%
      #If ProjectID is sequential like 1, 2, 3
      #mutate(inds = inds[ProjectID]) %>%
      group_by(ProjectID) %>%
      slice(first(inds)) %>%
      ungroup() %>%
      select(-inds)
    
    #  ProjectID Dist 
    #      <int> <chr>
    #1         1 y    
    #2         2 h    
    #3         3 k    
    

    data

    df <- structure(list(ProjectID = c(1L, 1L, 2L, 2L, 2L, 3L), Dist = c("x", 
    "y", "z", "x", "h", "k")), class = "data.frame", row.names = c(NA, -6L))