Search code examples
sqlsql-servercountsum

Count frequencies of words separated with multiple spaces


I would like to count the occurrences of all words in a column. The tricky part is that words in a row can appear in long stretches; meaning there are many spaces in-between.

This is a dummy example:

column_name

aaa bbb ccc ddd
[aaa]
bbb
bbb

So far I managed to use the following code

SELECT column_name, 
    SUM(LEN(column_name) - LEN(REPLACE(column_name, ' ', ''))+1) as counts
FROM 
    dbo.my_own
GROUP BY 
    column_name

The code gives me smth like this

column_name         counts

aaa bbb ccc ddd     1
[aaa]               1
bbb                 2

However, my desired output is:

column_name         counts

aaa                 1
[aaa]               1
bbb                 3
ccc                 1
ddd                 1

Solution

  • In SQL Server, you would use string_split():

    select s.value as word, count(*)
    from dbo.my_own o cross apply
         string_split(o.column_name, ' ') s
    where s.value <> ''
    group by s.value;
    

    String manipulation is highly database-dependent. Most databases have some method for doing this, but they can be quite different.