Search code examples
rdataframesummarize

How can I count the number of unique values in a column for each unique value of a different variable in R?


I have the following dataframe:

No Name Fruit Country
1  Tom  Pear  France
2  Tom  Pear  France
3  Tom  Pear  Poland
4  Bob  Kiwi  Poland
5  Bob  Pear  France
6  Ann  Pear  France
7  Ann  Pear  Poland
8  Dod  Pear  Poland
7  Dod  Plum  Poland
8  Dod  Plum  Poland
9  Dod  Pear  Germany
10 Dod  Kiwi  Estonia

How can I, for each unique name, count how many unique values there are in the fruit column and then add this number as a column to the dataframe?

The output I would like to get is this:

 No Name Fruit Country Unique
1  Tom  Pear  France   1
2  Tom  Pear  France   1
3  Tom  Pear  Poland   1
4  Bob  Kiwi  Poland   2
5  Bob  Pear  France   2
6  Ann  Pear  France   1
7  Ann  Pear  Poland   1
8  Dod  Pear  Poland   3
7  Dod  Plum  Poland   3
8  Dod  Plum  Poland   3
9  Dod  Pear  Germany  3
10 Dod  Kiwi  Estonia  3

I tried

df %>%
group_by(Name, Fruit) %>%
mutate(unique = n()) %>%
ungroup()

But the above does not work the way I wanted it to.


Solution

  • I'm not sure if your intent was just grouping by name and fruit, but its better to use mutate here. You also wouldn't use ungroup after unless you were doing manipulations to the data after:

    df %>% 
      group_by(Name,
               Fruit) %>% 
      mutate(Name.Fruit.Unique = n()) 
    

    Which gives you this. You can see for example Tom and Pear show up three times, and it is listed so in the data frame:

          No Name  Fruit Country Name.Fruit.Unique
       <dbl> <chr> <chr> <chr>               <int>
     1     1 Tom   Pear  France                  3
     2     2 Tom   Pear  France                  3
     3     3 Tom   Pear  Poland                  3
     4     4 Bob   Kiwi  Poland                  1
     5     5 Bob   Pear  France                  1
     6     6 Ann   Pear  France                  2
     7     7 Ann   Pear  Poland                  2
     8     8 Dod   Pear  Poland                  2
     9     7 Dod   Plum  Poland                  2
    10     8 Dod   Plum  Poland                  2
    11     9 Dod   Pear  Germany                 2
    12    10 Dod   Kiwi  Estonia                 1
    

    Edit

    It looks like I misinterpreted your question. For unique values grouped by name and then counting the unique number of fruits per person, this may be a better option:

    df %>% 
      group_by(Name) %>% 
      mutate(Count = n_distinct(Fruit))
    

    Which gives you this:

    # A tibble: 12 × 5
    # Groups:   Name [4]
          No Name  Fruit Country Count
       <dbl> <chr> <chr> <chr>   <int>
     1     1 Tom   Pear  France      1
     2     2 Tom   Pear  France      1
     3     3 Tom   Pear  Poland      1
     4     4 Bob   Kiwi  Poland      2
     5     5 Bob   Pear  France      2
     6     6 Ann   Pear  France      1
     7     7 Ann   Pear  Poland      1
     8     8 Dod   Pear  Poland      3
     9     7 Dod   Plum  Poland      3
    10     8 Dod   Plum  Poland      3
    11     9 Dod   Pear  Germany     3
    12    10 Dod   Kiwi  Estonia     3