Search code examples
rdataframedata.tablefrequency

Group data according to frequency of values in a column in a data frame using R


I have a data frame like the following:

a  b
1  23
2  34
1  34
3  45
1  56
3  567
2  67
2  90
1  91
3  98

I want to get the data frame with rows grouped according to the frequency of values in the first column. The output should be like the following:

a  b  freq
1  23   4
1  34   4
1  56   4
1  91   4
2  34   3
2  67   3
2  90   3
3  45   3
3  567  3
3  98   3

I have written the following code in R:

import library(dplyr)
setDT(df)[,freq := .N, by = "a"]
sorted = df[order(freq, decreasing = T),]
sorted

However, I get the following data frame as the output.

    a  b freq
 1: 1  23    4
 2: 1  34    4
 3: 1  56    4
 4: 1  91    4
 5: 2  34    3
 6: 3  45    3
 7: 3  567   3
 8: 2  67    3
 9: 2  90    3
10: 3  98    3

How can I solve this problem?


Solution

  • We can use n()

    library(dplyr)
    df1 %>%
        group_by(a) %>%
        mutate(freq = n()) %>%
        arrange(a, desc(freq))
    # A tibble: 10 x 3
    # Groups:   a [3]
    #       a     b  freq
    #  <int> <int> <int>
    # 1     1    23     4
    # 2     1    34     4
    # 3     1    56     4
    # 4     1    91     4
    # 5     2    34     3
    # 6     2    67     3
    # 7     2    90     3
    # 8     3    45     3
    # 9     3   567     3
    #10     3    98     3