Search code examples
kdbpykx

pykx fails when running pandas.read_html()


I am replacing embedPy with pykx.

I installed it and it works well with other functions.

This one does not work:

\l pykx.q 
pd:.pykx.import`pandas; 
test: "<table><tr><th>First Name</th><th>Last Name</th><th>Age</th></tr><tr><td>John</td><td>Doe</td><td>30</td></tr><tr><td>Jane</td><td>Smith</td><td>25</td></tr><tr><td>Emily</td><td>Jones</td><td>22</td></tr></table>"; 
pd[`:read_html][test][@;0]; 

The above works fine in embedPy, but fails with pykx.

The error I get is: TypeError("cannot parse from 'numpy.ndarray'")

Any ideas?


Solution

  • PyKX converts a q CharVector to Python NumPy array of bytes by default.

    https://code.kx.com/pykx/2.5/pykx-under-q/upgrade.html#functional-differences

    q).pykx.eval["lambda x: print(type(x))"][test];
    <class 'numpy.ndarray'>
    
    q).pykx.eval["lambda x: print(x)"][test];
    [b'<' b't' b'a' b'b' b'l' b'e' b'>' b'<' b't' b'r' b'>' b'<' b't' b'h' ...
    

    You can create a helper to convert to strings as you require:

    q)b2s:.pykx.eval["lambda x: x.tobytes().decode('UTF-8')"]
    q).pykx.print pd[`:read_html]b2s[test]
    
    [  First Name Last Name  Age
    0       John       Doe   30
    1       Jane     Smith   25
    2      Emily     Jones   22]
    

    More info on default conversions:

    https://code.kx.com/pykx/2.5/pykx-under-q/intro.html#function-argument-types

    And:

    https://code.kx.com/pykx/2.5/user-guide/fundamentals/text.html