Search code examples
sqlkdb

In kdb, is there an equivalent of left join on multiple columns with inequalities (like a modifiable version of asof joins)?


I have an order table in kdb which has order ids, order entry time and order deletion time, as shown below:

+----+-------------------------+-------------------------+
| id |        entrytime        |       deletetime        |
+----+-------------------------+-------------------------+
|  1 | 2020.01.01T00:00:00.000 | 2020.01.01T04:00:00.000 |
|  2 | 2020.01.01T00:00:00.000 | 2020.01.01T03:59:58.000 |
|  3 | 2020.01.01T00:00:00.000 | 2020.01.01T00:00:02.000 |
|... | ...                     | ...                     |
+----+-------------------------+-------------------------+

I would like to convert this into view that shows open/active orders every second, as shown below:

+-------------------------+---------+
|        datetime         | openids |
+-------------------------+---------+
| 2020.01.01T00:00:00.000 | 1 2 3   |
| 2020.01.01T00:00:01.000 | 1 2 3   |
| 2020.01.01T00:00:02.000 | 1 2     |
| 2020.01.01T00:00:03.000 | 1 2     |
| ...                     | ...     |
| 2020.01.01T03:59:57.000 | 1 2     |
| 2020.01.01T03:59:58.000 | 1       |
| 2020.01.01T03:59:59.000 | 1       |
| 2020.01.01T04:00:00.000 |         |
+-------------------------+---------+

What I need is an equivalent of SELECT * FROM table1 LEFT JOIN ON datetime >= entrytime and datetime < deletetime in kdb.

I have tried window joins but with little success.

I would really appreciate any help on this.


Solution

  • There are many options to do this. Please, see one of them below. But to start with I would convert times to timestamp type, datetime is hard to deal with and considered obsolete.

    t: ([] id: 1 2 3
        ; entrytime: 3#2020.01.01D
        ; deletetime: 2020.01.01D04:00:00 2020.01.01D03:59:58 2020.01.01D00:00:02);
    t: update datetime: {x+00:00:01*til `int$1e-9*y-x}'[entrytime;deletetime] from t;
    t: ungroup t;
    t: select openid: id by datetime from t;
    timetable: ([]datetime: {x+00:00:01*til 2+`int$1e-9*y-x} . value exec min datetime
        , max datetime from t);
    t: timetable lj t;
    t
    

    Explanation:

    1. Add datetime column to table t which keeps track of all times when order is available with with one second step
    2. ungroup table t to get id<->available time mapping
    3. group table t by available time. This already gives result you are looking for. But there could be gaps for times when no orders were available
    4. Fill gaps by joining timetable, which has all times in orders' create/delete times range, with table t.