Search code examples
pythonpostgresqlpsycopg2

How would I get the execution time of copy_from() function when using psycopg2?


Now I am using psycopg2 to connect to postgresql and use the copy function to load the local txt data into the database.

Currently my code is as follows:

#!/usr/bin/env python
import time
import psycopg2
import psycopg2.extensions
from psycopg2.extras import LoggingConnection, LoggingCursor
import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

class MyLoggingCursor(LoggingCursor):
    def execute(self, query, vars=None):
        self.timestamp = time.time()
        return super(MyLoggingCursor, self).execute(query, vars)

    def callproc(self, procname, vars=None):
        self.timestamp = time.time()
        return super(MyLoggingCursor, self).callproc(procname, vars)

class MyLoggingConnection(LoggingConnection):
    def filter(self, msg, curs):
        return msg + "   %d ms" % int((time.time() - curs.timestamp) * 1000)

    def cursor(self, *args, **kwargs):
        kwargs.setdefault('cursor_factory', MyLoggingCursor)
        return LoggingConnection.cursor(self, *args, **kwargs)

db_settings = {
    "database": "gps_stat",
    "user": "user",
    "password": "test@123",
    "host": "localhost",
    "port": 26000
}

target_txt = "test.txt"
table_name = 'gps_base'

start_time=time.time()
conn = psycopg2.connect(connection_factory=MyLoggingConnection, **db_settings)
conn.initialize(logger)
cur = conn.cursor('cursor_unique_name', cursor_factory=psycopg2.extras.DictCursor)
f = open(target_txt, 'r')
cur.copy_from(f, table_name, sep=',')
conn.commit()
conn.close()
f.close()
print("Time:{}ms".format(time.time()-start_time))

But the MyLoggingConnection does not meet my needs.

There are about 200 million pieces of data in the test_txt. After the above program runs, it prints Time: 2000ms, but in fact I waited about 20 minutes to get the result of Time: 2000ms.

What can I do to get the real execution time of the copy_from() function?


Solution

  • Initially I had same issue and then I have written small program to verify this and came to know this

    start_time=time.time()
    
    time.sleep(3)
    
    print("Total Time Taken :: {} ms".format(time.time()-start_time))
    

    and output is

    Total Time Taken :: 3.010446071624756 ms

    It means, time which you are getting is in seconds as I am putting time on sleep for 3 sec. If it is in ms then, result should have been 3000+ ms whereas result is 3.010 which is nothing but sec. I believe your execution is taking total 2000 sec.

    happy coding!