I'm looking to find transactions (contiguous?) in a data set of shops which follows the trend that they eventually complete a transaction despite a few prior cancellations in a day.
A valid batch transaction must meet a set of criteria.
X
amount of cancellation but 1
completion.The query should retain all batches which meet the above criteria. The final table should have a column batch
with the running total of the batches to differentiate them.
Initial Table:
shop amount status date
------------------------------
A 1234 Cancelled 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101011
A 1000 Completed 20101011
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
C 333 Cancelled 20101012
C 333 Completed 20101012
C 333 Completed 20101012
D 111 Cancelled 20101013
D 155 Cancelled 20101013
D 111 Completed 20101013
D 155 Completed 20101013
Delineated By Days:
shop amount status date
------------------------------
A 1234 Cancelled 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101010
A 1234 Cancelled 20101010
------------------------------
A 1234 Completed 20101011
A 1000 Completed 20101011
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
------------------------------
C 333 Cancelled 20101012
C 333 Completed 20101012
C 333 Completed 20101012
------------------------------
D 111 Cancelled 20101013
D 155 Cancelled 20101013
D 111 Completed 20101013
D 155 Completed 20101013
Resultant Table :
shop amount status date batch
-------------------------------------
A 1234 Cancelled 20101010 1
A 1234 Cancelled 20101010 1
A 1234 Completed 20101010 1
-------------------------------------
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
-------------------------------------
C 333 Cancelled 20101012 4
C 333 Completed 20101012 4
C 333 Completed 20101012 5
-------------------------------------
D 111 Cancelled 20101013 6
D 155 Cancelled 20101013 7
D 111 Completed 20101013 6
D 155 Completed 20101013 7
Table Query:
([] shop:`A`A`A`A`A`A`B`B`B`B`C`C`C`D`D`D`D; amount: 1234 1234 1234 1234 1234 1000 100 100 4321 4321 333 333 333 111 155 111 155; status:`Cancelled`Cancelled`Completed`Cancelled`Completed`Completed`Cancelled`Cancelled`Cancelled`Cancelled`Cancelled`Completed`Completed`Cancelled`Cancelled`Completed`Completed; date: `20101010`20101010`20101010`20101010`20101011`20101011`20101011`20101011`20101011`20101011`20101012`20101012`20101012`20101013`20101013`20101013`20101013)
Explanation:
On the first day, A makes 4 transactions. The first three are batched together as they have the same amount [cancelled -> cancelled -> completed]. The last one transaction is ignored as it is the end of day.
On the second day, A makes a transaction of the same amount of 1234
but it does not take the previous day transaction as part of its batch. A completes another transaction of 1000
. B make four transaction but they not tracked as they are a) cancelled or b) powers of ten.
On the third day, C makes three transaction of the same amount. This is considered two batches since the first cancellation and completion forms the initial batch, and the the final completed transaction is a batch by its own.
On the fourth day, D makes four transactions and form two batches. Note that the transaction are not contiguous here since there are two cancelled transaction with varying amount, but both are completed in the future.
Table is ordered by timestamp and date i.e. 23:59:59 to 00:00:00. Query need not be a one-liner and can be a multi line query writing to any temp table/variable etc.
Additionally, if there's a way to get number of cancelled transaction per batch that will be helpful.
So first count the number of batches that are completed.
q)n:count select from tab where status=`Completed
Then use the below query to assign the batch numbers to each Completed row
q)btab:update batch:1+til n from tab where status=`Completed
q)btab
shop amount status date batch
------------------------------------
A 1234 Cancelled 20101010
A 1234 Cancelled 20101010
A 1234 Completed 20101010 1
A 1234 Cancelled 20101010
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
C 333 Cancelled 20101012
C 333 Completed 20101012 4
C 333 Completed 20101012 5
D 111 Cancelled 20101013
D 155 Cancelled 20101013
D 111 Completed 20101013 6
D 155 Completed 20101013 7
Then reverse the table to fill forwards the nulls by date,shop and amount and reverse back and remove any Cancellations that are powers of 10 (using same logic as terrylynch)
q)ftab:reverse update fills batch by date,shop,amount from reverse btab where not (status=`Cancelled)&{x=`int$x}10 xlog amount
q)ftab
shop amount status date batch
------------------------------------
A 1234 Cancelled 20101010 1
A 1234 Cancelled 20101010 1
A 1234 Completed 20101010 1
A 1234 Cancelled 20101010
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
B 100 Cancelled 20101011
B 100 Cancelled 20101011
B 4321 Cancelled 20101011
B 4321 Cancelled 20101011
C 333 Cancelled 20101012 4
C 333 Completed 20101012 4
C 333 Completed 20101012 5
D 111 Cancelled 20101013 6
D 155 Cancelled 20101013 7
D 111 Completed 20101013 6
D 155 Completed 20101013 7
And then select from the table and pull data that has batch numbers
q)stab:select from ftab where batch<>0N
q)stab
shop amount status date batch
------------------------------------
A 1234 Cancelled 20101010 1
A 1234 Cancelled 20101010 1
A 1234 Completed 20101010 1
A 1234 Completed 20101011 2
A 1000 Completed 20101011 3
C 333 Cancelled 20101012 4
C 333 Completed 20101012 4
C 333 Completed 20101012 5
D 111 Cancelled 20101013 6
D 155 Cancelled 20101013 7
D 111 Completed 20101013 6
D 155 Completed 20101013 7
q)
Finally here is a query to get the number of cancellations per batch
q)select numberOfCancellations:-1+count batch by batch from stab
batch| numberOfCancellations
-----| ---------------------
1 | 2
2 | 0
3 | 0
4 | 1
5 | 0
6 | 1
7 | 1