Search code examples
cpostgresqlpostgresql-extensions

String manipulation by C language function of PostgreSQL


I'd like to create a postgresql c-Language function whose arguments and return values are varchar and which uses Char [] during processing.

But, it does not work as expected.

I made a program of c like this.

#include <postgres.h>
#include <port.h>
#include <fmgr.h>
#include <stdlib.h>

#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif

// Required for windows.
extern PGDLLEXPORT Datum VARCHAR_CHAR_ARRAY_VARCHAR(PG_FUNCTION_ARGS);

PG_FUNCTION_INFO_V1(VARCHAR_CHAR_ARRAY_VARCHAR);

Datum VARCHAR_CHAR_ARRAY_VARCHAR(PG_FUNCTION_ARGS) {
    // Get arg.
    VarChar  *arg1 = (VarChar *)PG_GETARG_VARCHAR_P(0);

    // VarChar to Char[].
    char *c = (char *)VARDATA(arg1);

    elog(NOTICE, "VarChar to Char[]."); 
    elog(NOTICE, c);//Log1

    // Do something.(e.g. replace)
    // Since it uses another system, it must be Char [].

    //Char[] to VarChar.
    VarChar *rtn = (VarChar *)VARDATA(c);
    elog(NOTICE, "Char[] to VarChar.");
    elog(NOTICE, rtn);//Log2

    // Return VarChar.
    PG_RETURN_VARCHAR_P(rtn);
}

And, created such a script.

CREATE OR REPLACE FUNCTION public.VARCHAR_CHAR_ARRAY_VARCHAR(character varying)
  RETURNS character varying AS
'$libdir/test/VARCHAR_CHAR_ARRAY_VARCHAR.dll', 'VARCHAR_CHAR_ARRAY_VARCHAR'
  LANGUAGE c VOLATILE STRICT;

The execution result is like this.

SELECT VARCHAR_CHAR_ARRAY_VARCHAR('a');
-- Expected value: 'a'
-- Postgresql terminated abnormally... :(

SELECT VARCHAR_CHAR_ARRAY_VARCHAR('Action');
-- Expected return value: 'Action'
-- Actual return value  :'n'
-- Log1  :'NOTICE:  n'
-- Log2  :''

SELECT VARCHAR_CHAR_ARRAY_VARCHAR('ActionAction');
-- Expected return value: 'ActionAction'
-- Actual return value  : ''  <-Why?
-- Log1  :''
-- Log2  :''

SELECT VARCHAR_CHAR_ARRAY_VARCHAR('Action Action');
-- Expected return value: 'Action Action'
-- Actual return value  : 'n Action'
-- Log1  :'NOTICE:  Action Action'
-- Log2  :'NOTICE:  ction'

SELECT VARCHAR_CHAR_ARRAY_VARCHAR('1234567890');
-- Expected return value: '1234567890'
-- Actual return value  : '6789'
-- Log1  :'NOTICE:  1234567890'
-- Log2  :'NOTICE:  90'

Execution environment:

  • Windows Server 2012 R2 x64
  • PostgreSQL 9.5.9, compiled by Visual C++ build 1800, 64-bit

I'm using C language for the first time, so I'm glad to present sample code.

I'm sorry, my English is not good.


Solution

  • Manually constructing a text or VarChar like that won't work. You can't just

    VarChar *rtn = (VarChar *)VARDATA(c);
    

    because what you're doing there is trying to interpret the char* value c as if it were a VarChar*, reading the first few bytes as a VARLENA header, and then getting some substring within it as the value. Not going to work.

    It's much easier to use cstring_to_text and text_to_cstring to convert. text* is exactly compatible with VarChar*. (They're in utils/builtins.h).

    e.g.

     PG_RETURN_VARCHAR_P(cstring_to_text(c));
    

    To do it manually, have to palloc a string with strlen bytes + VARHDRSIZE, SET_VARSIZE, then strcpy the data to the VARDATA offset. See src/include/utils/varlena.h and src/backend/utils/adt/varlena.c. It's fiddly and not necessary when PostgreSQL has helper functions for the purpose.

    Personally I recommend ignoring varchar completely for C-level programming. Define your functions as accepting and returning text.