Search code examples
postgresqlsql-insertupsertpostgresql-9.4udf

UDF to encode strings works in psql and Perl but not in Python


I wrote a user defined function in Postgres 9.4 to encode strings:

CREATE OR REPLACE FUNCTION platform.encode_sig(sig text)   
RETURNS bigint AS $BODY$ 
  declare   sig_id bigint; 
begin
    lock table platform.sig2encodings in access exclusive mode;   
    execute 'select sig_id from platform.sig2encodings where sig = ''' || sig || '''' into sig_id;

    if sig_id is null   
    then
       raise notice 'I do not have encoding for %', sig;
       execute 'insert into platform.sig2encodings (sig) values (''' || sig || ''')';
       execute 'select sig_id from platform.sig2encodings where sig = ''' || sig || '''' into sig_id;   
    else
       raise notice 'I do have encoding for %', sig;   
    end if;

  return sig_id;

END; 
$BODY$   
LANGUAGE plpgsql VOLATILE   COST 100;

The table:

CREATE TABLE platform.sig2encodings
(
  sig_id bigserial NOT NULL,
  sig text,
  CONSTRAINT sig2encodings_pkey PRIMARY KEY (sig_id ),
  CONSTRAINT sig2encodings_sig_key UNIQUE (sig )
)

The call in pgadmin or psql inserts the data into the table:

select * from platform.encode_sig('NM_Gateway_NL_Shutdown');

The call in python gets the id, but does not insert the data:

db="""dbname='XXX' user='XXX' password='XXX' host=XXX port=XXX"""

def encode_sig(sig):
   try:
      conn=psycopg2.connect(db)
   except:
      print "I am unable to connect to the database."
      exit()

   cur = conn.cursor()
   try:
      sql = "select * from platform.encode_sig('" + sig + "');"
      print sql
      cur.execute(sql)
   except:
      print "I can't retrieve sid"

   row = cur.fetchone()
   return row[0]

print str(encode_sig('NM_Gateway_UDS_CC'))

Output from the python script:

$ ./events_insert.py 
616
617
618
619
620
621
$ ./events_insert.py 
622
623
624
625
626
627

The table in postgres is empty. What is going on?

Update:

The following perl script works (with all the console outputs (NOTICEs) and the rows in the table):

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;
use DBI;

my $dbh = get_connection();
$dbh->do("SELECT platform.encode_sig('blah blah blah')");
$dbh->disconnect();

sub get_connection {
    return DBI->connect('dbi:Pg:dbname=XXX;host=XXX;port=XXX',
                        'XXX', 'XXX', { RaiseError => 1 });
}

The DB configuration is pretty standard one. These lines come from the postgresql.conf (since they are commented out, default values are assumed):

#fsync = on                             # turns forced synchronization on or off
#synchronous_commit = on                # synchronization level;
                                        # off, local, remote_write, or on
#wal_sync_method = fsync                # the default is the first option
                                        # supported by the operating system:
                                        #   open_datasync
                                        #   fdatasync (default on Linux)
                                        #   fsync
                                        #   fsync_writethrough
                                        #   open_sync
#full_page_writes = on                  # recover from partial page writes
#wal_log_hints = off                    # also do full page writes of non-critical updates
                                        # (change requires restart)
#wal_buffers = -1                       # min 32kB, -1 sets based on shared_buffers
                                        # (change requires restart)
#wal_writer_delay = 200ms               # 1-10000 milliseconds

#commit_delay = 0                       # range 0-100000, in microseconds
#commit_siblings = 5                    # range 1-1000

Solution

  • It's unclear, how the table can be empty after you see sig_id's returned. The only plausible explanations that come to mind:

    • You are checking a different table (in a different schema or a different DB) by accident.
    • You are running with auto_commit = off and forgot to COMMIT your transaction. Results are not visible to other sessions before COMMIT.

    Either way, your function is needlessly convoluted, you wouldn't need dynamic SQL with EXECUTE. Since you are concatenating unescaped text parameters into code, you are wide open to random syntax errors and SQL injection.
    You are also dangerously close to naming conflicts between the parameter name sig and the column name sig. You ditch this last bullet with your dynamic SQL, but it's still a loaded foot-gun. Read the chapter Variable Substitution for PL/pgSQL in the manual and consider unique names.

    Finally, it is also extremely inefficient to have one function call per row. The whole procedure can be replaced with this single SQL statement:

    LOCK TABLE platform.sig2encodings IN ACCESS EXCLUSIVE MODE;
    
    WITH sel AS (
       SELECT e.sig_id, e.sig
           , (s.sig IS NULL) AS insert_new
       FROM   platform.encode_sig e
       LEFT   JOIN platform.sig2encodings s USING (sig)
       )
    ,    ins AS (
       INSERT INTO platform.sig2encodings (sig)
       SELECT sig FROM sel WHERE insert_new
       RETURNING sig_id, sig, true  -- value for insert_new
       )
    SELECT * FROM sel WHERE NOT insert_new
    UNION ALL
    SELECT * FROM ins;
    

    This inserts all sig from encode_sig into sig2encodings that were not there, yet. It returns the resulting sig_id, sig and insert_new = true, appended to sig_id, sig and insert_new = false from encode_sig that were not inserted.

    If you need a function for single row INSERT-or-SELECT that's safe for concurrent use:

    Or you hope that INSERT .. ON CONFLICT IGNORE makes it into the next release to simplify things:

    Update: It has been committed for 9.5. The /devel manual already has instructions.