Search code examples
pythonpandaskdbq-langexxeleron-q

Passing pandas DataFrame containing string column to kdb+ (using qPython API)


exxeleron/qPython module allows to send the pandas DataFrame to kdb+/q's table.

Let's prepare the data:

import pandas.io.data as web
import datetime
import numpy

start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f = web.DataReader(["F","MSFT"], 'yahoo', start, end) # download stock data from Yahoo Finance
f = f.to_frame().reset_index() # flatten the MultiIndex to have a sym column, see below
f = f[["Date","minor","Close"]]
f.columns = ["dt","sym","val"] # just give comfortable names

The DataFrame object to be passed then looks like:

f.head()
# Out: 
#           dt   sym    val
# 0 2010-01-04     F  10.28
# 1 2010-01-04  MSFT  30.95
# 2 2010-01-05     F  10.96
# 3 2010-01-05  MSFT  30.96
# 4 2010-01-06     F  11.37

f.dtypes
# Out: 
# dt     datetime64[ns]
# sym            object
# val           float64    

When I try to send it to kdb+/q, I get the following error:

import qpython.qconnection as qconnection
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True)
q.open()
q('set', numpy.string_('tbl'), f)

# File "G:\Anaconda\lib\site-packages\qpython\_pandas.py", line 159, in _write_pandas_series
#   data = data.fillna(QNULLMAP[-abs(qtype)][1])
# KeyError: -10

Solution

  • The sym column in the DataFrame is ambiguous for the qpython and it cannot determinate the default serialization properly. In such case, you have to provide type hinting for columns conversion, by setting the meta attribute:

    from qpython import MetaData 
    from qpython.qtype import QSYMBOL_LIST
    
    f.meta = MetaData(sym = QSYMBOL_LIST)
    q('set', numpy.string_('tbl'), f)
    

    This instructs qpython to serialize sym column as q symbol list:

    q)meta tbl                              
    c  | t f a                              
    ---| -----                              
    dt | p                                  
    sym| s                                  
    val| f 
    
    q)tbl                                   
    dt                            sym  val  
    ----------------------------------------
    2010.01.04D00:00:00.000000000 F    10.28
    2010.01.04D00:00:00.000000000 MSFT 30.95
    2010.01.05D00:00:00.000000000 F    10.96
    ..
    

    Alternatively, you can represent sym column as a q generic list containing strings. You can also apply type conversion to other columns:

    from qpython import MetaData 
    from qpython.qtype import QSTRING_LIST, QINT_LIST, QDATETIME_LIST
    
    f.meta = MetaData(sym = QSTRING_LIST, val = QINT_LIST, dt = QDATETIME_LIST)
    q('set', numpy.string_('tbl'), f)
    

    which results in:

    q)meta tbl                          
    c  | t f a                          
    ---| -----                          
    dt | z                              
    sym|                                
    val| i  
    
    q)tbl                               
    dt                      sym    val  
    ----------------------------------  
    2010.01.04T00:00:00.000 "F"    10   
    2010.01.04T00:00:00.000 "MSFT" 30   
    ..