I need to insert 46mln points into PostGIS database in a decent time. Inserting 14mln points was executing around 40 minutes, it its awful and inefficient.
I created database with spatial GIST index and wrote this code:
import psycopg2
import time
start = time.time()
conn = psycopg2.connect(host='localhost', port='5432', dbname='test2', user='postgres', password='alfabet1')
filepath = "C:\\Users\\nmt1m.csv"
curs = conn.cursor()
with open(filepath, 'r') as text:
for i in text:
i = i.replace("\n", "")
i = i.split(sep=" ")
curs.execute(f"INSERT INTO nmt_1 (geom, Z) VALUES (ST_GeomFromText('POINTZ({i[0]} {i[1]} {i[2]})',0), {i[2]});")
conn.commit()
end = time.time()
print(end - start)
curs.close()
conn.close()
Im looking for the best way to inserting data, it not must be in python.
Thanks ;)
Cześć! Welcome to SO.
There are a few things you can do to speed up your bulk insert:
If the target table is empty or is not being used in a production system, consider dropping the indexes right before inserting the data. After the insert is complete you can recreate them. This will avoid PostgreSQL to re-index your table after every insert, which in your case means 46 million times.
If the target table can be entirely built from your CSV file, consider creating an UNLOGGED TABLE
. Unlogged tables are much faster than "normal" tables, since they (as the name suggests) are not logged in the WAL file (write-ahead log). Unlogged tables might be lost in case of database crash or an unclean shutdown!
Use either the PostgreSQL COPY
command or copy_from
as @MauriceMeyer pointed out. If for some reason you must stick to inserts, make sure you're not committing after every insert ;-)
Cheers