Search code examples
sqlpostgresqlodbckdb

How to pull/read data from an external SQL server in batches ard store all in one kdb+ table


We have an external PostgreSQL server where we pull data from using odbc. We typically pull data in chunks as we need, so sometimes pull all date "where date > (some specific date)" or "where id > (some specified id)" or between certain dates (like start_date and end_date). Example code below

The above conditions are pretty straight forward and can be implemented without burdening the server too much.

But it's a different story when we need to pull the entire server data (too much data on the server). Is there a way to pull data from the server in batches so an to not overburden the server?

Below is an example that we use to

getdata:{[]
 query: "select data.id, data.first_name, data.last_name, data.email, data.created_at from data  where data.created_at > '2020-02-04' order by id asc" ];
 us::.odbc.open `dbs;
 leads::.odbc.eval[us; query];
 .odbc.close us;
 };

dbs = server name


Solution

  • Could you create date batches on the kdb side and generate a startDate-endDate query for each? e.g. create 5 years of start/end dates starting at 2002.01.01

    q){(`date$s),'-1+`date$12+s:(12*til y)+`month$x}[2002.01.01;5]
    2002.01.01 2002.12.31
    2003.01.01 2003.12.31
    2004.01.01 2004.12.31
    2005.01.01 2005.12.31
    2006.01.01 2006.12.31
    

    I guess you would have to loop over each table on the server also.