Search code examples
rregexsummary

find partially similar string elements and summarize the data


I have a dataset and I want to summarize my data based on (let's say) the first three characters. in fact, concatenate rows which have the same 3 first letter in the column. For example:

df
title freq
ACM100    3
ACM200    2
ACM300    2
MAT11     1
MAT21     2
CMP00     3
CMP10     3

I want to summarize the database on the title of first 3 characters and count the frequency.

result:
title  freq
ACM    7
MAT    3
CMP    6

Would be appreciated to help me in R.


Solution

  • You can use aggregate with transform

    aggregate(freq ~ title, transform(df, title = substr(title, 1, 3)), sum)
    #   title freq
    # 1   ACM    7
    # 2   CMP    6
    # 3   MAT    3