Search code examples
databasekdb

Querying KDB+ Server Using Python


I'm trying to process data that is stored on KDB+ server using Jupyter notebook. This is how I connected to the server:

connection = kx.QConnection(host='localhost', port=5000)

Then O read a csv file using:

table1 = connection.read.csv("path/to/file", schema)

After that, I wrote the table1 to the database:

connection('insert', 'table1', conn_table1)

I need to implement the following which is not working:

start_date="3/6/2015 12:44:31"
end_date="3/6/2020 12:44:31"
connection(f'select col1 datetimecol2 by col1 from table1 where col3<`2  & datetimecol2 within (`${start_date};`${end_date})')

How can I do it using KDB+? And where can I find a good resource to learn KDB+?

Thank you!

I expected to get some results but instead I got an error: QError: /


Solution

  • Your query builds to:

    select col1 datetimecol2 by col1 from table1 
         where col3<`2  & datetimecol2 within (`$3/6/2015 12:44:31;`$3/6/2020 12:44:31)
    

    The issues:

    1. You do not have a , between the column names col1 datetimecol2, this would cause datetimecol2 to attempt to be used to index in to col1 and is not what you intended to do I am guessing. If datetimecol2 is a symbol type this operation would fail with type
    2. You have col1 to the left and right of by, this is not allowed and would error with 'dup names for cols/groups col1
    3. You are attempting to create symbols with special characters in them ( and /). These must be wrapped in ""
    4. You are using col3<`2 doing a less than operation on a symbol datatype. While this can technically work it's more likely you want a numeric column and comparison
    5. In a where clause you don't use & by default. A , separates where clauses

    Implementing fixes 1-5 gives a workable query:

    select datetimecol2 by col1 from table1 
         where col3<2, datetimecol2 within (`$"3/6/2015 12:44:31";`$"3/6/2020 12:44:31")
    
    1. You are using the symbol datatype for timestamps, this is inefficient

    Your timestamp format can be parsed by q:

    q)"P"$"3/6/2015 12:44:31"
    2015.03.06D12:44:31.000000000
    

    So in your schema you should use more similar to:

    schema = {'col3':kx.LongAtom,
              'datetimecol2':kx.TimestampAtom,
               }
    

    Then your query should pass parameters and not build up complex strings:

    from datetime import datetime
    start_date = datetime.strptime("3/6/2015 12:44:31",'%m/%d/%Y %H:%M:%S')
    end_date=datetime.strptime("3/6/2020 12:44:31",'%m/%d/%Y %H:%M:%S')
    
    connection('{select datetimecol2 by col1 from table1 where
                                col3<2, datetimecol2 within (x;y)}',
                  start_date, end_date)
    

    See:

    The KX Academy has courses you can follow to learn q/kdb+ and PyKX https://learninghub.kx.com/academy/