Search code examples
rdataframedplyrsamplesummarize

How to summarise taking a random value from a categorical column?


I have two species and some values for them.

values <- c(1,2,3,4,5,6,7,8,9,10)
spp <- c(rep("a",5), rep("b",5))
df <- data.frame(spp, values, stringsAsFactor = FALSE)

I want to summarise the data frame, grouping by these species. My idea is summarise, getting a random value by species. Using dplyr philosophy, I want to do this:

n.df <- df %>%
group_by(spp) %>%
summarise(value = sample(value))

but the sample function isn't working into summarise

Does anyone have a solution?


Solution

  • Since you are using dplyr you can also take advantage of sample_n function, i.e.

    library(dplyr)
    
    df %>%
       group_by(spp) %>%
       sample_n(1)
    

    which gives,

    # A tibble: 2 x 2
    # Groups:   spp [2]
      spp   values
      <chr>  <dbl>
    1 a          2
    2 b          9