I would like to count the occurrences of all words in a column. The tricky part is that words in a row can appear in long stretches; meaning there are many spaces in-between.
This is a dummy example:
column_name
aaa bbb ccc ddd
[aaa]
bbb
bbb
So far I managed to use the following code
SELECT column_name,
SUM(LEN(column_name) - LEN(REPLACE(column_name, ' ', ''))+1) as counts
FROM
dbo.my_own
GROUP BY
column_name
The code gives me smth like this
column_name counts
aaa bbb ccc ddd 1
[aaa] 1
bbb 2
However, my desired output is:
column_name counts
aaa 1
[aaa] 1
bbb 3
ccc 1
ddd 1
In SQL Server, you would use string_split()
:
select s.value as word, count(*)
from dbo.my_own o cross apply
string_split(o.column_name, ' ') s
where s.value <> ''
group by s.value;
String manipulation is highly database-dependent. Most databases have some method for doing this, but they can be quite different.