exxeleron/qPython
module allows to send the pandas DataFrame
to kdb+/q
's table.
Let's prepare the data:
import pandas.io.data as web
import datetime
import numpy
start = datetime.datetime(2010, 1, 1)
end = datetime.datetime(2015, 2, 6)
f = web.DataReader(["F","MSFT"], 'yahoo', start, end) # download stock data from Yahoo Finance
f = f.to_frame().reset_index() # flatten the MultiIndex to have a sym column, see below
f = f[["Date","minor","Close"]]
f.columns = ["dt","sym","val"] # just give comfortable names
The DataFrame
object to be passed then looks like:
f.head()
# Out:
# dt sym val
# 0 2010-01-04 F 10.28
# 1 2010-01-04 MSFT 30.95
# 2 2010-01-05 F 10.96
# 3 2010-01-05 MSFT 30.96
# 4 2010-01-06 F 11.37
f.dtypes
# Out:
# dt datetime64[ns]
# sym object
# val float64
When I try to send it to kdb+/q
, I get the following error:
import qpython.qconnection as qconnection
q = qconnection.QConnection(host = 'localhost', port = 5000, pandas = True)
q.open()
q('set', numpy.string_('tbl'), f)
# File "G:\Anaconda\lib\site-packages\qpython\_pandas.py", line 159, in _write_pandas_series
# data = data.fillna(QNULLMAP[-abs(qtype)][1])
# KeyError: -10
The sym
column in the DataFrame
is ambiguous for the qpython
and it cannot determinate the default serialization properly. In such case, you have to provide type hinting for columns conversion, by setting the meta
attribute:
from qpython import MetaData
from qpython.qtype import QSYMBOL_LIST
f.meta = MetaData(sym = QSYMBOL_LIST)
q('set', numpy.string_('tbl'), f)
This instructs qpython
to serialize sym
column as q symbol list:
q)meta tbl
c | t f a
---| -----
dt | p
sym| s
val| f
q)tbl
dt sym val
----------------------------------------
2010.01.04D00:00:00.000000000 F 10.28
2010.01.04D00:00:00.000000000 MSFT 30.95
2010.01.05D00:00:00.000000000 F 10.96
..
Alternatively, you can represent sym
column as a q generic list containing strings. You can also apply type conversion to other columns:
from qpython import MetaData
from qpython.qtype import QSTRING_LIST, QINT_LIST, QDATETIME_LIST
f.meta = MetaData(sym = QSTRING_LIST, val = QINT_LIST, dt = QDATETIME_LIST)
q('set', numpy.string_('tbl'), f)
which results in:
q)meta tbl
c | t f a
---| -----
dt | z
sym|
val| i
q)tbl
dt sym val
----------------------------------
2010.01.04T00:00:00.000 "F" 10
2010.01.04T00:00:00.000 "MSFT" 30
..