I am looking for a way to list multiple counts of various groups of content from a row using Redshift. For example, I have a row containing process, md5, and path. I have a query to count each instance of process, md5, and path, but I also want to count matching items from a single column in the same query.
select process, path, md5, count(*) as cnt
from process
where path ilike '%:\\\\Windows\\\\System32\\\\%'
group by 1,2,3
The above query will result in the following data. The count in this table is the count of matching the process, path, and md5.
I want to add a column that also counts the process. You see the process wusa.exe has five rows, but there are 7 instances of that process.
I have tried various queries so far and have not been successful. Most of my queries result in either a count of all process or the same value as the PPM-Count column. I tried to sum the PPM-Count column based on matching processes.
You can use window functions; here, we can take the sum of the counts of all (groups of) rows that share the same process name:
select process, path, md5,
count(*) as cnt,
sum(count(*)) over(partition by process) cnt2
from process
where path ilike '%. . .%'
group by 1,2,3