Search code examples
rpostgresqlggplot2plr

Using geom_function in ggplot2 to call a function written in postgreSQL plr


I struggle to call a plr function in postgreSQL from a R-script and use it in ggplot2 - geom_function. The following examples are extremly simplified but hopefully show the issue.

Let's assume I have the following plr function:

CREATE OR REPLACE FUNCTION public.mypgfunc(
    x numeric,
    a numeric)
    RETURNS numeric
    LANGUAGE 'plr'
    COST 100
    VOLATILE PARALLEL UNSAFE
AS $BODY$
    return (x*a)
$BODY$;

To call this from ggplot - geom_function I thought, I could write the following simple wrapper function to do the query (I use the rpostgres package):

myWrapper <- function(x , a) {
  con <- dbConnect(drv = RPostgres::Postgres() , dbname='mydb')
  
  q <- dbSendQuery(con , "select mypgfunc( $1 , $2 )")
  dbBind(q , c(x,a))
  y <- dbFetch(q)
  dbClearResult(q)
  
  dbDisconnect(con)
  
  return(y)
}

But if I now call this function from ggplot, I get the following warning message and an empty plot:

Computation failed in stat_function(): Query requires 2 params; 102 supplied.

The ggplot code looks as follows:

ggplot() +
  geom_function(fun = myWrapper , args = list(a = 5))

If I write the plr function in R instead and call this from geom_function, everything works fine. If I call the myWrapper directly (outside ggplot) with just one value for x and a, respectively, everything works fine as well.

So, what do I have to change?


Solution

  • I don't now much about using SQL databases in R. But I think I know the reason of your error.

    If you look at the help page for stat_function (which is what geom_function uses under the hood), you will see that, by default, it creates 101 values of x over an arbitrary range (I think it is [0,1]).

    This is why the error message mentions that "Query requires 2 params; 102 supplied". You are concatenating x (which is a vector of length 101) with a (length 1), when you call dbBind(q , c(x,a)).

    The size of x is defined by the parameter n. So if you run:

    ggplot() + geom_function(fun = myWrapper , args = list(a = 5), n=1)
    

    You are passing a single value of x to myWrapper(I think x=0) and you should get the same result as the situation you described in the last sentence (I got a warning message because ggplot could not draw a line with only 1 point).

    So, basically, you need to do separate queries for each value of the vector x. A straight forward way of doing this is looping over the values of x:

    y <- NULL
    for (xx in x) {
        q <- dbSendQuery(con , "select mypgfunc( $1 , $2 )")
        dbBind(q , c(xx, a))
        if (is.null(y)) {
            y <- dbFetch(q)
        } else {
            y <- rbind(y, dbFetch(q))
        }        
        dbClearResult(q)
    }
    

    Like I said, I have not worked with SQL in R. So I am sure there are smarter ways of doing this where you don't need to call an ugly for loop (not sure also if you need the dbSendQuery inside the loop).