I have a Clickhouse table, where one of the fields contains a textual description (~300 words).
For example Reviews:
Rev_id Place_id Stars Category Text
1 12 3 Food Nice food but a bad dirty place.
2 31 4 Sport Not bad, they have everything.
3 55 1 Bar Poor place,bad audience.
I'd like to make some word count analysis, such as general word frequency count (how many times each of the words has appeared) or top-K words per Category.
In the example:
word count
bad 3
place 2
... Is there a way to do it solely in ClickHouse without involving programming languages?
SELECT
arrayJoin(splitByChar(' ', replaceRegexpAll(x, '[.,]', ' '))) AS w,
count()
FROM
(
SELECT 'Nice food but a bad dirty place.' AS x
UNION ALL
SELECT 'Not bad, they have everything.'
UNION ALL
SELECT 'Poor place,bad audience.'
)
GROUP BY w
ORDER BY count() DESC
┌─w──────────┬─count()─┐
│ │ 4 │
│ bad │ 3 │
│ place │ 2 │
│ have │ 1 │
│ Poor │ 1 │
│ food │ 1 │
│ Not │ 1 │
│ they │ 1 │
│ audience │ 1 │
│ Nice │ 1 │
│ but │ 1 │
│ dirty │ 1 │
│ a │ 1 │
│ everything │ 1 │
└────────────┴─────────┘
SELECT CATEGORY, ....
GROUP BY CATEGORY, w