Search code examples
pythonescapingkdb

Escape characters in a q-lang query via Python


I have issues applying escape sequences in a query to a kdb server.

The native query is:

select lo:min price, hi:max price by sym from trade where date = 2007.02.28, not cond like "*[BMPQTUWZ]*", corr <= 1

Any help is welcomed.

I am using Python to send it, and I put \" for " in the query in order to transmit the double quote sign:

from qpython import qconnection
import pandas as pd
from datetime import datetime

query = 'select lo:min price, hi:max price by sym from trade where date = 2007.02.28, not cond like \"*[BMPQTUWZ]*\", corr <= 1'

#query = '\"2+2\"'


print('Attempt to open a connection...')

q = qconnection.QConnection(host=server, port=server_port, username=user, password=server_password, timeout=server_timeout, pandas = True)
q.open()
print('Connection established...')

print('Attempt to send the query', query)
df = pd.DataFrame(q.sendSync(query))
print('Query <<', query, '>> sent...')

Output for '"2+2"' and output for 'select lo:min price, hi:max price by sym from trade where date = 2007.02.28, not cond like \"[BMPQTUWZ]\", corr <= 1' is listed below.

PS G:\atom-projects\test> python.exe .\test-1.py
Attempt to open a connection...
Connection established...
Attempt to send the query "2+2"
Traceback (most recent call last):
  File ".\test-1.py", line 23, in <module>
    df = pd.DataFrame(q.sendSync(query))
  File "C:\Python38\lib\site-packages\pandas\core\frame.py", line 509, in __init__
    raise ValueError("DataFrame constructor not properly called!")
ValueError: DataFrame constructor not properly called!


PS G:\atom-projects\test> python.exe .\test-1.py
Attempt to open a connection...
Connection established...
Attempt to send the query select lo:min price, hi:max price by sym from trade where date = 2007.02.28, not cond like "*[BMPQTUWZ]*", corr <= 1
Traceback (most recent call last):
  File ".\test-1.py", line 23, in <module>
    df = pd.DataFrame(q.sendSync(query))
  File "C:\Python38\lib\site-packages\qpython\qconnection.py", line 303, in sendSync
    response = self.receive(data_only = False, **options)
  File "C:\Python38\lib\site-packages\qpython\qconnection.py", line 380, in receive
    result = self._reader.read(**self._options.union_dict(**options))
  File "C:\Python38\lib\site-packages\qpython\qreader.py", line 138, in read
    message = self.read_header(source)
  File "C:\Python38\lib\site-packages\qpython\qreader.py", line 158, in read_header
    header = self._read_bytes(8)
  File "C:\Python38\lib\site-packages\qpython\qreader.py", line 388, in _read_bytes
    data = self._stream.read(length)
  File "C:\Python38\lib\socket.py", line 669, in readinto
    return self._sock.recv_into(b)
socket.timeout: timed out

Now, I have trouble with escape characters.

If I send the query to the kdb server.

query = 'select lo:min price, hi:max price by sym from trade where date = 2007.02.28, corr <= 1'

the query is transmitted. But when I add not cond like \"[BMPQTUWZ]\", there is an error.

OS and Python language details: Windows 10, x64, Python 3.8.1


Solution

  • Escaping works fine for me:

    q)t:([]sym:100?`1;cond:100#`Buy`Sell`Example;corr:100?2f;price:100?100f)
    q)t
    sym cond    corr      price   
    ------------------------------
    n   Buy     1.339583  82.48839
    m   Sell    1.17743   95.01603
    a   Example 1.135715  54.35053
    o   Buy     0.7889166 29.12758
    b   Sell    1.851226  45.29782
    …
    
    df = q.sendSync('select hi:max price, lo:min price, cond:first cond  by sym from t where not cond like \"[BS]*\"')
    
    df
        hi  lo  cond
    sym             
    b'a'    64.230790   24.164567   b'Example' 
    b'b'    60.669536   12.766243   b'Example'
    b'c'    85.688555   79.785111   b'Example'
    b'd'    83.056125   83.056125   b'Example' 
    b'e'    73.149409   73.149409   b'Example'
    b'f'    93.359214   36.638445   b'Example' 
    ...
    

    The '\"2+2\"' fails because 4 is returned and this can't be presented in a DataFrame.

    Your select seems to be failing due a timeout. Perhaps try increasing your time out.

    Also what are you trying to achieve with the select? My example is not where cond starts with either B or S. So only returns Example.

    Conor's suggestion above may be faster and better if cond is a single character column and not a symbol. If it is a symbol column like mine above with whole words. Like/Regex must be used.