Search code examples
kdbqsqlqueryqsqldatabaseqsqlquerymodel

Conditional Group By on Selected rows in KDB/Q-Sql


I have a requirement where I have to execute multiple queries and perform group by on a column with where clause , group by column is fixed and where condition will be perform on fixed column with variable criteria . Only Column name and aggregation type will be varies For example if I have table :

k1  k2  val1    val2
1   1   10      30
1   1   20      31
1   2   30      32
2   2   40      33
2   3   50      34
2   4   60      35
2   4   70      36
3   4   80      37
3   5   90      38
3   5   100     39


t:([] k1:1 1 1 2 2 2 2 3 3 3; k2:1 1 2 2 3 4 4 4 5 5; val1:10 20 30 40 50 60 70 80 90 100; val2:31 31 32 33 34 35 36 37 38 39)

Queries which I need to perform will be like

 select avg_val1:avg val1 by k1 from t where k2 in 2 3 4
 select sum_val1:sum val1 by k1 from t where k2 in 2 3 
 select sum_val2:sum val2 by k1 from t where k2 in 2 3 5
 select min_val2:min val2 by k1 from t where k2 in 1 2 3 4 5

I want to execute these queries in a single execution using functional queries. I tried this, but not able to put right condition and syntax

res:?[t;();(enlist`k1)!enlist`k1;(`avg_val1;`sum_val2)!({$[x; y; (::)]}[1b;(avg;`val1)];{$[x; y; (::)]}[1b; (sum;`val2)])];

k1  avg_val1 sum_val2
1   20.0      94
2   55.0      138
3   90.0      114

Instead putting 1b in condition , i want to put real condition like this:

res:?[t;();(enlist`k1)!enlist`k1;(`avg_val1;`sum_val2)!({$[x; y; (::)]}[(in;`k2;2 3 4i);(avg;`val1)];{$[x; y; (::)]}[(in;`k2;2 3i); (sum;`val2)])];

But it will give "type" error, since query will be first group by k1 ,and k2 will be list. So condition is also not right.

I want to know what can be the best solution for this. May be there can be better approach to solve the same . Please help me to in same.

Thank you.


Solution

  • The vector conditional (?) operator can get you closer to what you'd like.

    Given your table

    t:([] k1:1 1 1 2 2 2 2 3 3 3; k2:1 1 2 2 3 4 4 4 5 5; val1:10 20 30 40 50 60 70 80 90 100; val2:31 31 32 33 34 35 36 37 38 39)
    k1 k2 val1 val2
    ---------------
    1  1  10   31  
    1  1  20   31  
    1  2  30   32  
    2  2  40   33  
    2  3  50   34  
    2  4  60   35  
    2  4  70   36  
    3  4  80   37  
    3  5  90   38  
    3  5  100  39  
    

    you can update, say, the val1 column to hold null values wherever a condition does not hold

    update val1:?[k2 in 2 3 4;val1;0N] from t
    k1 k2 val1 val2
    ---------------
    1  1       31  
    1  1       31  
    1  2  30   32  
    2  2  40   33  
    2  3  50   34  
    2  4  60   35  
    2  4  70   36  
    3  4  80   37  
    3  5       38  
    3  5       39  
    

    and with a little more work you can get the desired aggregate (NB: the aggregate functions ignore null values)

    select avg ?[k2 in 2 3 4;val1;0N] by k1 from t
    k1| x 
    --| --
    1 | 30
    2 | 55
    3 | 80
    

    You can wrap this up into a functional select statement like so

    ?[t;();{x!x}enlist`k1;`avg_val1`sum_val2!((avg;(?;(in;`k2;2 3 4);`val1;0N));(sum;(?;(in;`k2;2 3);`val2;0N)))]
    k1| avg_val1 sum_val2
    --| -----------------
    1 | 30       32      
    2 | 55       67      
    3 | 80       0       
    

    However, this can break when you use an function that does not ignore nulls, e.g. count. You may be better off using the where operator in you select statement:

    select avg val1 where k2 in 2 3 4 by k1 from t
    k1| x 
    --| --
    1 | 30
    2 | 55
    3 | 80
    
    ?[t;();{x!x}enlist`k1;`avg_val1`sum_val2!((avg;(`val1;(where;(in;`k2;2 3 4))));(sum;(`val2;(where;(in;`k2;2 3)))))]
    k1| avg_val1 sum_val2
    --| -----------------
    1 | 30       32      
    2 | 55       67      
    3 | 80       0