Search code examples
pythonpostgresqlexecutedatabase-cursor

Python hangs during a cursor.execute() using Psycopg2


I currently have a problem with my Python script that when I execute a SQL function from cursor.execute() for a specific function, It runs the function, function completes, and then the script hangs for somewhere close to five hours before resuming the rest of the program.

I watch the server status of my function in pgAdminIII as it is called from Python and when the function completes the server just shows an open connection to my python-user. Before it would idle in transaction, but after adding conn.set_isolation_level(0), it only shows the open connection.

I have also put prints surrounding the cursor.execute() but it takes five hours to proceed to the following print statement. I have tried killing the connection and restarting it to no avail. I stripped the function call completely out of my script so it was the only process my script handled. Nothing has worked.

The difficulty is that both sides keep pointing to the other as the source of the problem. I am assuming it is some sort of interaction effect from this specific function. Perhaps a signal is not being sent back to python from postgres after running the specific function? None of the other functions have an issue with the same method of execution.

Ideas? Solutions? -I was considering a timeout, but the runtime can vary anywhere from 20 minutes to 2 hours. -The function it runs makes a table and returns only a string stating 'Complete', all operations are accounted for in the database after it disappears as an active function in server status.

Here is the code I use for replicating the problem, isolated from my main script:

#!/usr/bin/env python

import sys
import os
import smtplib
import ftplib
import gzip
import logging
import shutil
import argparse
import traceback
import subprocess
from email.mime.text import MIMEText

import psycopg2
import psycopg2.extras
import psycopg2.extensions
from psycopg2 import OperationalError

pg_db    = "ppl"
pg_port  = "5432"
pg_user  = "<user>"
pg_pass  = "<pass>"
pg_host  = "<host>"

# connect to database
try:
    conn = psycopg2.connect(database=pg_db,
                            port=pg_port, 
                            user=pg_user, 
                            password=pg_pass, 
                            host=pg_host)
    conn.set_isolation_level(0)
except:
    print "Unable to connect to the database."

cur = conn.cursor()

print 'Starting select <function1>'
cur.execute("select <function1>(999999, null );")
print 'Assign variable...'
a = cur.fetchall()
print 'Done'

print 'Starting select <function2>'
cur.execute("select <function2>(999999, null );")
print 'Assign variable...'
b = cur.fetchall()
print 'Done'

print a
print b

conn.close()
sys.exit()

Then I ran a strace as suggested with these results:

<previous function statements for about 4k lines>

write(1, "Starting select <function>"..., 45) = 45
sendto(3, "Q\0\0\0009select <function>"..., 58, MSG_NOSIGNAL, NULL, 0) = 58
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "T\0\0\0001\0\1<function_name>\0"..., 16384, 0, NULL, NULL) = 500
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0mSINFO\0C00000\0Mv_eval_table "..., 16384, 0, NULL, NULL) = 110
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\4\247SINFO\0C00000\0Mcmd := CREATE"..., 16384, 0, NULL, NULL) = 1192
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\23hSINFO\0C00000\0Mcmd := \t--For"..., 16384, 0, NULL, NULL) = 4969
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 84
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 672
...
brk(0xbe9000)                           = 0xbe9000
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 84
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 168
poll([{fd=3, events=POLLIN|POLLERR}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
recvfrom(3, "N\0\0\0SSINFO\0C00000\0Mlicense_row_i"..., 16384, 0, NULL, NULL) = 1008
...

<does this for about 600k more lines before executing the rest of the script, I am assuming it is indefinitely looping until a timeout>

Next I ran a cProfile for a function that wrapped the second block of code that called the second function:

Tue Jan 15 16:04:50 2013    function2_prof

         7 function calls in 15755.988 CPU seconds

   Random listing order was used

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.000    0.000 {method 'cursor' of 'psycopg2._psycopg.connection' objects}
        1    0.000    0.000 15755.988 15755.988 test_sql_sub2.py:48(LR2)
        1 15755.988 15755.988 15755.988 15755.988 {method 'execute' of 'psycopg2._psycopg.cursor' objects}
        1    0.000    0.000 15755.988 15755.988 <string>:1(<module>)
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}
        1    0.000    0.000    0.000    0.000 {method 'close' of 'psycopg2._psycopg.cursor' objects}
        1    0.000    0.000    0.000    0.000 {method 'fetchall' of 'psycopg2._psycopg.cursor' objects}

Solution

  • The issue was due to a single line in multiple for-loops in the SQL function:

    raise info 'license_row_id := %', rec.license_row_id;
    

    These raise info messages were somehow not handled appropriately by python as they were by bash and pgAdmin. My guess is that because they were unhandled, they kept looping and processing and asking for more memory until it was unable to process, finally dropping the raise infos without warning or error and resuming the program.

    I commented out this needless raise info line and the whole script ran in 20 minutes, a normal run time for the program. If anyone has an idea how to make python handle raise info messages from postgresql that would probably even further answer this problem.