Search code examples
kdb

reshaping table based on column values


I was looking at a problem of reshaping a table creating new columns according based on values. I'm using the same example as this problem discussed there: A complicated sum in R data.table that involves looking at other columns

so I have a table:

df:([]ID:1+til 5;
        Group:1 1 2 2 2;
        V1:10 + 2 * til 5;
        Type_v1:`t1`t2`t1`t1`t2;
        V2:3 0N 0N 7 8;
        Type_v2:`t2```t3`t3);

ID Group V1 Type_v1 V2 Type_v2
------------------------------
1  1     10 t1      3  t2     
2  1     12 t2                
3  2     14 t1                
4  2     16 t1      7  t3     
5  2     18 t2      8  t3     

and the goal is to transform it to get the sum of values by group and type. please note the new columns created. basically all types in Type_v1 and Type_v2 are used to create columns for the resulting table.

#   group v_1 type_1 v_2 type_2 v_3 type_3
#1:     1  10     t1  15     t2  NA   <NA>
#2:     2  30     t1  18     t2  15     t3

I did the beginning but I am unable to transform the table and create the new columns. also of course I'm trying to get all the columns created in a dynamic way, as it would not be possible to input 20k columns manually.

df1:select Group, Value:V1, Type:Type_v1 from df;
df2:select Group, Value:V2, Type:Type_v2 from df;
tr:df1,df2;
tr:0!select sum Value by Group, Type from tr where Type <> ` ;

basically I'm missing the equivalent of:
dcast(tmp, group ~ rowid(group), value.var = c("v", "type"))

any help and explanations appreciated,


Solution

  • You could expand on Terry's pivot to dynamically do the select parts above using functional form. See more detail here:

    https://code.kx.com/q/basics/funsql/

    // Personally, I would try to stay clear of column names too similar to reserved keywords in kdb
    df: `id`grpCol`v_1`typCol_1`v_2`typCol_2 xcol df;
    
    {[df;n]
       
        // dynamically create cols from 1 to n
        cls:`$("v_";"typCol_"),\:/:string 1 + til n;
    
        // functional form of select for each type/value col before joining together
        df:(,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls;
    
        // sum, then pivot
        df:0!select sum v by grpCol, typCol from df where typCol <> `;
        P:exec distinct typCol from df;
        df:exec P#(typCol!v) by grpCol:grpCol from df;
    
        // Type cols seem unnecessary but 
        // Can be done with another functional select
        ?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
    
        }[df;2]
    
    grpCol  t1  typCol_1    t2  typCol_2    t3  typCol_3
    1       10  t1          15  t2          0N  t3
    2       30  t1          18  t2          15  t3
    

    EDIT - More detailed breakdown below:

    cls:`$("v_";"typCol_") ,\:/: string 1 + til n;
    

    Dynamically create a symbol list for the columns as they are required for column names when using functional form. I start by creating a list of v_ and typCol_ up to number n.

    ,\:/: -> join with each left and each right iterators

    https://code.kx.com/q/ref/maps/#each-left-and-each-right

    This allows me to join every item on the left ("v_";"typCol_") with every item on the right.

    The same could be achieved with cross but you would have to restructure the list with flip and cut flip n cut `$("v_";"typCol_") cross string 1 + til n

    (,/) {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls; 
    

    (,/) -> This is the over iterator used with join. It takes the 1st table, joins it to the 2nd, then takes that and joins on to the 3rd etc.

    https://code.kx.com/q/ref/over/

    {?[x;();0b;`grpCol`v`typCol!`grpCol,y]}[df] each cls
    
    // functional select
    
    ?[table;  where;  by;  columns]
    ?[x;      ();     0b;  `grpCol`v`typCol!`grpCol,y]
    

    This creates a list of tables, 1 for each column pair in the cls variable. Notice how I don't explicitly state x or y in the function like this {[x;y]}. This is because x y and z can be used implicitly, so this function works with or without.

    The important part here is the last param (columns). For a functional select it is a dictionary with column names as the key and what the columns are as the values

    e.g. `grpCol`v`typCol!`grpCol`v_1`typCol_1 -> this is renaming each v and typCol so they are the same to then join them all together with (,/).

    There is a useful keyword to help with figuring out functional form -> parse

    parse"select Group, Value:V1, Type:Type_v1 from df"
    
    0   ?
    1   `df
    2   ()
    3   0b
    4   (`Group`Value`Type)!`Group`V1`Type_v1
    
        P:exec distinct typCol from df;
        df:exec P#(typCol!v) by grpCol:grpCol from df;
    

    pivoting is outlined here: https://code.kx.com/q/kb/pivoting-tables/

    It effectively flips/rotates a section of the table. It takes the distinct types from typCol as the columns and uses the v column as the rows for each corresponding typCol

    ?[table;  where;  by;  columns]
    ?[df;();0b;(`grpCol,raze P,'`$"typCol_",/:string 1 + til count P)!`grpCol,raze flip (P;enlist each P)]
    

    Again look at the last param in the functional select i.e. columns. This is how it looks after being dynamically generated:

    (`grpCol`t1`typCol_1`t2`typCol_2`t3`typCol_3)!(`grpCol;`t1;enlist `t1;`t2;enlist `t2;`t3;enlist `t3)

    It is kind of a hacky way to get the type columns, I select each t1 t2 t3 with a typeCol_1 _2 _3,

    `t1 = (column) `t1
    `typCol_1 = enlist `t1 -> the enlist here tells kdb I want the value `t1 rather than the column