Search code examples
rmatchsummarize

How to summarise values in a column with non-exact match in R?


I have a data.table with over ten thousand of rows. I want to count in one column how many times a variable appears, but I want to use non-exact match. The data looks like this:

dt1 <- data.table (place = c("a north", "a south", "b south", "a north", "c west", "b north", "c south", "a west", "b west"))

     place
1: a north
2: a south
3: b south
4: a north
5: c west
6: b north
7: c south
8: a west
9  b west

I just want to count how many times "a", "b" and "c" appears independent from the words that follows. I would like the result to look like this:

   a b c
1: 4 3 2

I tried summarise, charmath and pmatch, but they didn't work. Could anyone help?


Solution

  • You can try a full data.table solution:

     dt1[,'.'(var = sub(" .*", "",place))
       ][,'.'(cnt = .N), by = var
       ][,data.table::transpose(.SD, make.names= 'var')]
    
       a b c
    1: 4 3 2