count amazon-redshift field string-matching

How can I add a field that counts items in another field in Redshift

I am looking for a way to list multiple counts of various groups of content from a row using Redshift. For example, I have a row containing process, md5, and path. I have a query to count each instance of process, md5, and path, but I also want to count matching items from a single column in the same query.

select process, path, md5, count(*) as cnt
from process
where path ilike '%:\\\\Windows\\\\System32\\\\%'
group by 1,2,3

The above query will result in the following data. The count in this table is the count of matching the process, path, and md5.

I want to add a column that also counts the process. You see the process wusa.exe has five rows, but there are 7 instances of that process.

I have tried various queries so far and have not been successful. Most of my queries result in either a count of all process or the same value as the PPM-Count column. I tried to sum the PPM-Count column based on matching processes.

Solution

You can use window functions; here, we can take the sum of the counts of all (groups of) rows that share the same process name:

select process, path, md5, 
    count(*) as cnt,
    sum(count(*)) over(partition by process) cnt2
from process
where path ilike '%. . .%'
group by 1,2,3