Search code examples
rdataframeipckdb

How to send a data.frame from R to Q/KDB?


I have a large data.frame (15 columns and 100,000 rows) in an existing R session that I want to send to a Q/KDB instance. From KDB's cookbook, the possible solutions are:

RServer for Q: use KDB to create new R instance which shares memory space. This doesn't work because my data is in an existing instance of R.

RServe: run an R server and use TCP/IP to communicate with Q/KDB client. This does not work, because as per RServe's documentation, "every connection has a separate workspace and working directory" and so i presume does not see my existing data.

R Math Library: access R's functionality via a math library without needing an instance of R. This does not work because my data is already in an instance of R.

So any other ideas on how to send data from R to Q/KDB?


Solution

  • open a port in Q. I start Q with a batch file:

    @echo off
    c:\q\w32\q -p 5001
    

    load qserver.dll

    tryCatch({
    dyn.load("c:/q/qserver.dll")}
      ,error = function(f){
        print("can't load qserver.dll")
      })
    

    Then use these

    open_connection <- function(host="localhost", port=5001, user=NULL) {
             parameters <- list(host, as.integer(port), user)
          h <- .Call("kx_r_open_connection", parameters)
        assign(".k.h", h, envir = .GlobalEnv)
        return(h)
    }
    
    close_connection <- function(connection) {
             .Call("kx_r_close_connection", as.integer(connection))
    }
    
    execute <- function(connection, query) {
             .Call("kx_r_execute", as.integer(connection), query)
    }
    
     d<<-open_connection(host="localhost",port=thePort)
    
    ex2 <- function(...) 
    {
      query <- list(...)
      theResult <- NULL
      for(i in query) theResult <- paste0(theResult,i)
      return(execute(d,paste0(theResult)))
    }
    

    then ex2 can take multiple arguments so you can build queries with R variables and strings

    Edit: thats for R from Q, heres R to Q

    2nd Edit: improved algo:

    library(stringr)
      RToQTable <- function(Rtable,Qname,withColNames=TRUE,withRowNames=TRUE,colSuffix = NULL)
    {
      theColnames <- if(!withColNames || length(colnames(Rtable))==0) paste0("col",as.character(1:length(Rtable[1,])),colSuffix) else colnames(Rtable)
      if(!withRowNames || length(rownames(Rtable))==0) withRowNames <- FALSE
      Rtable <- rbind(Rtable,"linesep")
      charnum <- as.integer(nchar(thestr <- paste(paste0(theColnames,':("',str_split(paste(Rtable,collapse='";"'),';\"linesep\";\"')[[1]],');'),collapse="")) - 11)
      if(withRowNames)
        ex2(Qname,":([]",Qname,str_replace_all(paste0("`",paste(rownames(Rtable),collapse="`"))," ","_"),";",.Internal(substr(thestr,1L,charnum)),"))") else
        ex2(Qname,":([]",.Internal(substr(thestr,1L,charnum)),"))")
    }
    
    > bigMat <- matrix(runif(1500000),nrow=100000,ncol=15)
    > microbenchmark(RToQTable(bigMat,"Qmat"),times=3)
    Unit: seconds
                          expr      min     lq     mean   median       uq      max neval
     RToQTable(bigMat, "Qmat") 10.29171 10.315 10.32766 10.33829 10.34563 10.35298     3
    

    This will work for a matrix, so for a data frame just save a vector containing the types of each column, then convert the dataframe to a matrix, import the matrix to Q, and cast the types

    Note that this algo is approx O(rows * cols^1.1) so you'll need to chop the columns up into multiple matricies if you have any more than 20 to get O(rows * cols)

    but for your example 150,000 rows and 15 columns takes 10 seconds so further optimization may not be necessary.