I'm trying to process data that is stored on KDB+ server using Jupyter notebook. This is how I connected to the server:
connection = kx.QConnection(host='localhost', port=5000)
Then O read a csv file using:
table1 = connection.read.csv("path/to/file", schema)
After that, I wrote the table1 to the database:
connection('insert', 'table1', conn_table1)
I need to implement the following which is not working:
start_date="3/6/2015 12:44:31"
end_date="3/6/2020 12:44:31"
connection(f'select col1 datetimecol2 by col1 from table1 where col3<`2 & datetimecol2 within (`${start_date};`${end_date})')
How can I do it using KDB+? And where can I find a good resource to learn KDB+?
Thank you!
I expected to get some results but instead I got an error: QError: /
Your query builds to:
select col1 datetimecol2 by col1 from table1
where col3<`2 & datetimecol2 within (`$3/6/2015 12:44:31;`$3/6/2020 12:44:31)
The issues:
,
between the column names col1 datetimecol2
, this would cause datetimecol2
to attempt to be used to index in to col1
and is not what you intended to do I am guessing. If datetimecol2
is a symbol type this operation would fail with type
col1
to the left and right of by
, this is not allowed and would error with 'dup names for cols/groups col1
and /
). These must be wrapped in ""
col3<`2
doing a less than operation on a symbol datatype. While this can technically work it's more likely you want a numeric column and comparison&
by default. A ,
separates where clausesImplementing fixes 1-5 gives a workable query:
select datetimecol2 by col1 from table1
where col3<2, datetimecol2 within (`$"3/6/2015 12:44:31";`$"3/6/2020 12:44:31")
Your timestamp format can be parsed by q:
q)"P"$"3/6/2015 12:44:31"
2015.03.06D12:44:31.000000000
So in your schema you should use more similar to:
schema = {'col3':kx.LongAtom,
'datetimecol2':kx.TimestampAtom,
}
Then your query should pass parameters and not build up complex strings:
from datetime import datetime
start_date = datetime.strptime("3/6/2015 12:44:31",'%m/%d/%Y %H:%M:%S')
end_date=datetime.strptime("3/6/2020 12:44:31",'%m/%d/%Y %H:%M:%S')
connection('{select datetimecol2 by col1 from table1 where
col3<2, datetimecol2 within (x;y)}',
start_date, end_date)
See:
The KX Academy has courses you can follow to learn q/kdb+ and PyKX https://learninghub.kx.com/academy/