Search code examples
pandaskdb+qpython

QPython Pandas Interaction


I have a question pertaining to Pandas Data Frame which I want to enrich with Timings from Tick Source(kdb Table).

Pandas DataFrame
Date         sym   Level
2018-07-01   USDJPY    110
2018-08-01   GBPUSD    1.20

I want to enrich this dataframe with timings (first time for a given currency pair for a given date when the level is crossed).

from qpython import qconnection
from qpython import MetaData
from qpython.qtype import QKEYED_TABLE
from qpython.qtype import QSTRING_LIST, QINT_LIST, 
QDATETIME_LIST,QSYMBOL_LIST
q.open()
df.meta = MetaData(sym = QSYMBOL_LIST, val = QINT_LIST, Date = 
QDATE_LIST)
q('set', np.string_('tbl'), df)

The above code converts pandas dataframe to q table.

Example Code to Access tick data(kdb Tables)

select Mid by sym,date from quotestackevent where date = 2018.07.01, sym = `CCYPAIR

How can I use dataframe columns sym and date to pull data from kdb tables using Qpython?


Solution

  • Suppose on the KDB+ side you have a table t with columns sym (of type symbol), date (of type date), and mid (of type float), for example generated by the following code:

    t:`date xasc ([] sym:raze (3#) each `USDJPY`GBPUSD`EURBTC;date:9#.z.d-til 3;mid:9?`float$10)
    

    Then to bring the data for enrichment from the KDB+ side to the Python side you can do the following:

    from qpython import qconnection
    import pandas as pd
    
    df = pd.DataFrame({'Date': ['2018-09-08','2018-09-08','2018-09-07','2018-09-07'],'sym':['abc','def','abc','def']})
    df['Date']=df['Date'].astype('datetime64[ns]')
    
    with qconnection.QConnection(host = 'localhost', port = 5001, pandas = True) as q:
        X = q.sync('{select sym,date,mid from t where date in `date$x}',df['Date'])
    

    Here the first argument to q.sync() defines a function to be executed and the second argument is the range of dates you want to get from the table t. Inside the function the `date$x part converts the argument to a list of dates, which is needed because df['Date'] is sent as a list of timestamps to the KDB+ side.

    The resulting X data frame will have the sym column as binary strings, so you may want to do something like

    X['sym'].apply(lambda x: x.decode('ascii'))
    

    to convert that to strings.

    An alternative to sending the function definition is to have a function defined on the KDB+ side and send only its name from the Python side. So, if you can do something like

    getMids:{select sym,date,mid from t where date in `date$x}
    

    on the KDB+ side, then you can do

    X = q.sync('getMids',df['Date'])
    

    instead of sending the function definition.