All, I'm having trouble solving for what I believe to be a fairly straightforward task to search a table, identify a point and then truncate or delete the subsequent rows for a set of data within a table. I believe I need a nested function in my update query however I have not been successful writing one. I've also tried to create a "delete_me" column as well which will allow me to identify and then run a single delete which may be faster and better for auditing code as well.
Ideally, I'd like to wrap this in a callable function as there are a few different methods of truncation.
In my example below, I identify the maximum cumulative value date and then label the subsequent dated rows by id for eventual deletion.
///raw data for copy and paste - `:./Data/sample.csv;
id,idate,a,b,c
AAA,1/31/2014,1000,500,500
AAA,2/28/2014,900,500,50
AAA,3/31/2014,850,500,0
AAA,4/30/2014,800,500,0
AAA,5/31/2014,750,500,0
AAA,6/30/2014,700,500,0
AAA,7/31/2014,650,500,0
AAA,8/31/2014,550,500,0
AAA,9/30/2014,500,500,0
AAA,10/31/2014,450,500,0
BBB,6/30/2012,1000,500,2500
BBB,7/31/2012,950,500,75
BBB,8/31/2012,900,500,0
BBB,9/30/2012,850,500,0
BBB,10/31/2012,800,500,0
BBB,11/30/2012,750,500,0
BBB,12/31/2012,700,500,0
BBB,1/31/2013,650,500,0
BBB,2/28/2013,600,500,0
BBB,3/31/2013,550,500,0
BBB,4/30/2013,500,500,0
BBB,5/31/2013,450,500,0
BBB,6/30/2013,400,500,0
CCC,1/1/2016,1000,500,1200
CCC,2/29/2016,950,500,30
CCC,3/31/2016,900,500,0
CCC,4/30/2016,850,500,0
CCC,5/31/2016,800,500,0
CCC,6/30/2016,750,500,0
CCC,7/31/2016,700,500,0
CCC,8/31/2016,650,500,0
CCC,9/30/2016,600,500,0
CCC,10/31/2016,550,500,0
CCC,11/30/2016,500,500,0
CCC,12/31/2016,450,500,0
CCC,1/31/2017,400,500,0
CCC,2/28/2017,350,500,0
CCC,3/31/2017,300,500,0
CCC,4/30/2017,250,500,0
Load data and add some calculations
\c 100 150i
t:("SSFFF";enlist",") 0:`:./Data/sample.csv;
t: update kdbDate: "D"$string idate, d:(a-(b+c)),cum_d: sums (a-(b+c)) from t;
t:![t; (); (enlist`id)!enlist`id; (enlist`maxCum_d)!enlist(max;`cum_d)];
t:![t; enlist(=;`maxCum_d;`cum_d); (enlist`id)!enlist`id; (enlist `date_cutoff)!enlist(*:;`kdbDate)];
Below is where I'm presently stuck. I've also thought of using fills
to just fill in the date_cutoff
for the remaining rows per id
as well and avoid creating another column altogether.
show exec max(date_cutoff) by id from t;
assignDelete:{[t] update del: `delete_me by id from t where max (date_cutoff) > kdbDate}; //<--STUCK--
t: assignDelete over t;
t:![t; enlist (~:;(^:;`del)); 0b; `symbol$()] ; //delete from t where not null `del
Many thanks in advance! Desired output below
q)t
id idate a b c kdbDate d cum_d maxCum_d date_cutoff del
----------------------------------------------------------------------------------
AAA 1/31/2014 1000 500 500 2014.01.31 0 0 1650
AAA 2/28/2014 900 500 50 2014.02.28 350 350 1650
AAA 3/31/2014 850 500 0 2014.03.31 350 700 1650
AAA 4/30/2014 800 500 0 2014.04.30 300 1000 1650
AAA 5/31/2014 750 500 0 2014.05.31 250 1250 1650
AAA 6/30/2014 700 500 0 2014.06.30 200 1450 1650
AAA 7/31/2014 650 500 0 2014.07.31 150 1600 1650
AAA 8/31/2014 550 500 0 2014.08.31 50 1650 1650 2014.08.31
AAA 9/30/2014 500 500 0 2014.09.30 0 1650 1650 2014.08.31 delete_me
AAA 10/31/2014 450 500 0 2014.10.31 -50 1600 1650 delete_me
BBB 6/30/2012 1000 500 2500 2012.06.30 -2000 -400 1775
BBB 7/31/2012 950 500 75 2012.07.31 375 -25 1775
BBB 8/31/2012 900 500 0 2012.08.31 400 375 1775
BBB 9/30/2012 850 500 0 2012.09.30 350 725 1775
BBB 10/31/2012 800 500 0 2012.10.31 300 1025 1775
BBB 11/30/2012 750 500 0 2012.11.30 250 1275 1775
BBB 12/31/2012 700 500 0 2012.12.31 200 1475 1775
BBB 1/31/2013 650 500 0 2013.01.31 150 1625 1775
BBB 2/28/2013 600 500 0 2013.02.28 100 1725 1775
BBB 3/31/2013 550 500 0 2013.03.31 50 1775 1775 2013.03.31
BBB 4/30/2013 500 500 0 2013.04.30 0 1775 1775 2013.03.31 delete_me
BBB 5/31/2013 450 500 0 2013.05.31 -50 1725 1775 delete_me
BBB 6/30/2013 400 500 0 2013.06.30 -100 1625 1775 delete_me
CCC 1/1/2016 1000 500 1200 2016.01.01 -700 925 3145
CCC 2/29/2016 950 500 30 2016.02.29 420 1345 3145
CCC 3/31/2016 900 500 0 2016.03.31 400 1745 3145
CCC 4/30/2016 850 500 0 2016.04.30 350 2095 3145
CCC 5/31/2016 800 500 0 2016.05.31 300 2395 3145
CCC 6/30/2016 750 500 0 2016.06.30 250 2645 3145
CCC 7/31/2016 700 500 0 2016.07.31 200 2845 3145
CCC 8/31/2016 650 500 0 2016.08.31 150 2995 3145
CCC 9/30/2016 600 500 0 2016.09.30 100 3095 3145
CCC 10/31/2016 550 500 0 2016.10.31 50 3145 3145 2016.10.31
CCC 11/30/2016 500 500 0 2016.11.30 0 3145 3145 2016.10.31 delete_me
CCC 12/31/2016 450 500 0 2016.12.31 -50 3095 3145 delete_me
CCC 1/31/2017 400 500 0 2017.01.31 -100 2995 3145 delete_me
CCC 2/28/2017 350 500 0 2017.02.28 -150 2845 3145 delete_me
CCC 3/31/2017 300 500 0 2017.03.31 -200 2645 3145 delete_me
CCC 4/30/2017 250 500 0 2017.04.30 -250 2395 3145 delete_me
[EDIT] using fills
on another column seemed to work okay.
Note truncation after the max(cum_d)
t: update del:fills date_cutoff by id from t where kdbDate>date_cutoff;
or in functional form
t: ![t; enlist(>;`kdbDate;`date_cutoff);(enlist`id)!enlist`id;(enlist`del)! enlist (^\;`date_cutoff)];
id idate a b c kdbDate d cum_d maxCum_d date_cutoff del
----------------------------------------------------------------------------
AAA 1/31/2014 1000 500 500 2014.01.31 0 0 1650
AAA 2/28/2014 900 500 50 2014.02.28 350 350 1650
AAA 3/31/2014 850 500 0 2014.03.31 350 700 1650
AAA 4/30/2014 800 500 0 2014.04.30 300 1000 1650
AAA 5/31/2014 750 500 0 2014.05.31 250 1250 1650
AAA 6/30/2014 700 500 0 2014.06.30 200 1450 1650
AAA 7/31/2014 650 500 0 2014.07.31 150 1600 1650
AAA 8/31/2014 550 500 0 2014.08.31 50 1650 1650 2014.08.31
BBB 6/30/2012 1000 500 2500 2012.06.30 -2000 -400 1775
BBB 7/31/2012 950 500 75 2012.07.31 375 -25 1775
BBB 8/31/2012 900 500 0 2012.08.31 400 375 1775
BBB 9/30/2012 850 500 0 2012.09.30 350 725 1775
BBB 10/31/2012 800 500 0 2012.10.31 300 1025 1775
BBB 11/30/2012 750 500 0 2012.11.30 250 1275 1775
BBB 12/31/2012 700 500 0 2012.12.31 200 1475 1775
BBB 1/31/2013 650 500 0 2013.01.31 150 1625 1775
BBB 2/28/2013 600 500 0 2013.02.28 100 1725 1775
BBB 3/31/2013 550 500 0 2013.03.31 50 1775 1775 2013.03.31
CCC 1/1/2016 1000 500 1200 2016.01.01 -700 925 3145
CCC 2/29/2016 950 500 30 2016.02.29 420 1345 3145
CCC 3/31/2016 900 500 0 2016.03.31 400 1745 3145
CCC 4/30/2016 850 500 0 2016.04.30 350 2095 3145
CCC 5/31/2016 800 500 0 2016.05.31 300 2395 3145
CCC 6/30/2016 750 500 0 2016.06.30 250 2645 3145
CCC 7/31/2016 700 500 0 2016.07.31 200 2845 3145
CCC 8/31/2016 650 500 0 2016.08.31 150 2995 3145
CCC 9/30/2016 600 500 0 2016.09.30 100 3095 3145
CCC 10/31/2016 550 500 0 2016.10.31 50 3145 3145 2016.10.31
For this solution I've left-joined date_cutoff
by id
to the table so that all date_cutoff
entries are non-null, then used a vector conditional to determine whether to delete or not.
q)t:t lj select last date_cutoff by id from t where not null date_cutoff
q)update del:?[date_cutoff<kdbDate;`delete_me;`]from t
So long as there is only one distinct date_cutoff
within an id
grouping, this should work.