Search code examples
sqlsql-servereventscombinationstagging

Need solution to avoid repeated scanning in huge table


I have a event table which has 40 columns and fill up to 2 billion records. In that event table i would like to query for a combination event i.e Event A with Event B. Sometimes I may want to find more combination like Event A with B and C. It may goes to 5 or 6 combination.

I don't want to scan that table for every event in combination i.e Scanning for event A and scanning for event B. And I need a generic approach for more combination scanning as well.

Note: That 2 billion records is partitioned based on event date and data is been equally split.

Eg:

Sample Data

Need to find id's which has event A,B,C and need to find id's which has only A,B.

This number of combination is dynamic. I don't want to scan that table for each event and finally intersect the result.


Solution

  • There may be some mileage in using a sql server equivalent of the mysql group_concat function. For example

    drop table t
    create table t (id int, dt date, event varchar(1))
    insert into t values
    (1,'2017-01-01','a'),(1,'2017-01-01','b'),(1,'2017-01-01','c'),(1,'2017-01-02','c'),(1,'2017-01-03','d'),
    (2,'2017-02-01','a'),(2,'2017-02-01','b')
    
    select id,
            stuff(
        (
        select cast(',' as varchar(max)) + t1.event
        from t as t1
        WHERE t1.id  = t.id
        order by t1.id
        for xml path('')
        ), 1, 1, '') AS groupconcat
    from t
    group by t.id
    
    Results in
    
    id          groupconcat
    ----------- -----------
    1           a,b,c,c,d
    2           a,b
    

    If you then add a patindex

    select * from
    (
    select id,
            stuff(
        (
        select cast(',' as varchar(max)) + t1.event
        from t as t1
        WHERE t1.id  = t.id
        order by t1.id
        for xml path('')
        ), 1, 1, '') AS groupconcat
    from t
    group by t.id
    ) s
    where patindex('a,b,c%',groupconcat) > 0 
    

    you get this

    id          groupconcat
    ----------- ------------
    1           a,b,c,c,d