Search code examples
rselectuniquedplyr

Select unique values with 'select' function in 'dplyr' library


Is it possible to select all unique values from a column of a data.frame using select function in dplyr library? Something like "SELECT DISTINCT field1 FROM table1" in SQL notation.

Thanks!


Solution

  • In dplyr 0.3 this can be easily achieved using the distinct() method.

    Here is an example:

    distinct_df = df %>% distinct(field1)

    You can get a vector of the distinct values with:

    distinct_vector = distinct_df$field1

    You can also select a subset of columns at the same time as you perform the distinct() call, which can be cleaner to look at if you examine the data frame using head/tail/glimpse.:

    distinct_df = df %>% distinct(field1) %>% select(field1) distinct_vector = distinct_df$field1