Search code examples
countamazon-redshiftfieldstring-matching

How can I add a field that counts items in another field in Redshift


I am looking for a way to list multiple counts of various groups of content from a row using Redshift. For example, I have a row containing process, md5, and path. I have a query to count each instance of process, md5, and path, but I also want to count matching items from a single column in the same query.

select process, path, md5, count(*) as cnt
from process
where path ilike '%:\\\\Windows\\\\System32\\\\%'
group by 1,2,3

The above query will result in the following data. The count in this table is the count of matching the process, path, and md5.

enter image description here

I want to add a column that also counts the process. You see the process wusa.exe has five rows, but there are 7 instances of that process.

enter image description here

I have tried various queries so far and have not been successful. Most of my queries result in either a count of all process or the same value as the PPM-Count column. I tried to sum the PPM-Count column based on matching processes.


Solution

  • You can use window functions; here, we can take the sum of the counts of all (groups of) rows that share the same process name:

    select process, path, md5, 
        count(*) as cnt,
        sum(count(*)) over(partition by process) cnt2
    from process
    where path ilike '%. . .%'
    group by 1,2,3