Search code examples
kdb

What could be causing this 'invalid host' error on kdb query?


I get an odd error when trying to query too many dates from a date-partitioned historical database:

q)eod: h"select from eod where date within 2018.01.01 2018.04.22"
'/tablepath/2018.04.04/eod/somecolumn: invalid host
q)eod: h"select from eod where date within 2018.01.17 2018.04.20"
'/tablepath/2018.04.20/eod/othercolumn: invalid host
q)eod: h"select from eod where date within 2018.01.18 2018.04.20"
q)

Note that both dates mentioned in the error messages are within the date range that we manage to extract in the end, and that it fails on a different column each time. This seems to indicate it's something to do with the size of the table being pulled, but when we check the size of the largest table we managed to get:

q)(-22!eod) % 1024 * 1024
646.9043
q)count eod
2872546

we find that it's not particularly large by either memory size nor by number of rows.

Googling for "invalid host" errors doesn't seem to turn up anything relevant, and I'm not seeing anything in the kdb docs about size limits that would be relevant. Anyone got any ideas?

Edit:

When loading the table in a session and making the queries directly, we get what appears to be the same error, but with a different message. For instance:

q)jj: select from eod where date within 2018.01.01 2018.04.22
Too many compressed files open
k){0!(?).@[x;0;p1[;y;z]]}
'./2018.04.04/eod/settlecab: No such file or directory
.
?
(+`exch`date`class..
q.Q))

Note that the file ./2018.04.04/eod/settlecab does in fact exist, and contains data: I have no problem loading the data for just the date mentioned in the error, and the column mentioned has meaningful values:

q)jj: select from eod where date=2018.04.04
q)select count i by settlecab from jj
settlecab| x    
---------| -----
0        | 41573
1        | 2269

The key point seems to be the Too many compressed files open message, but what can I do about this?

Edit for Summary/Solutions:

The table in question had many columns, all stored in a compressed format. When issuing a query against too many dates at once, kdb would try to mmap all of those columns at once, running into a limit on how many compressed files could be open at once.

Once I understood the problem, several solutions were available:

  1. I could pull only certain columns from the database, reducing the number of files that kdb needed to keep open,
  2. I could force kdb to pull all the data into memory by adding a dummy where clause to the query, such as (null column) | not null column (hacky, but it works),
  3. I could have upgraded the kdb version and lifted OS limits (not practical in my case).

I still have no idea why this resulted in an invalid host error when querying the database remotely.


Solution

  • First off, can we just clarify the database structure you're working with. It seems from the filepaths returned in your errors that you've got a date-partitioned database. Did you mean non-segmented database when you said non-partitioned in your original query?

    In terms of a fix for your issue, have you tried loading your database into a session, and making those queries directly? If so do you get the same issues?

    If that seems to be working alright, the problem might lie with how you're defining your database handle. How is h defined in your original example?

    It might also be worth trying to select individual dates from your database, to try and isolate the problem, and to determine if it lies with your on-disk data. Try specifically querying the dates that are mentioned in your errors.

    You could also try performing your original queries with a subset a columns, again to try and pinpoint where your issue is coming from.

    Let us know if you get any further with this.

    Joseph