I am using my own pipeline to store the scrapped items into a PostgreSQL Database, I made an expansion a few days ago and I store the data into a 3 Databases now. So, I want to make the pipeline which inserting the data to be called every 100 items or it take the items and insert them 100 by 100.
The reason I want to make it fast and less headache on the DB Servers.
The solution was not that different from Anandhakumar's answer I created a global list in the settings file with a setter and getter method for it
# This buffer for the bluk insertion
global products_buffer
products_buffer = []
# Append product to the list
def add_to_products_buffer(product):
global products_buffer
products_buffer.append(product)
# Get the length of the product
def get_products_buffer_len():
global products_buffer
return len(products_buffer)
# Get the products list
def get_products_buffer():
global products_buffer
return products_buffer
# Empty the list
def empty_products_buffer():
global products_buffer
products_buffer[:] = []
Then I imported it in the pipeline
from project.settings import products_buffer,add_to_products_buffer,get_products_buffer_len,empty_products_buffer,get_products_buffer
and I append the item to the list every time the pipeline is called and i check if the length of the list is 100 I loop on the list to prepare a many inserts quires but the most important magic is to commit them all in one line, Don't commit in the loop or you won't gain anything and it will take long time to insert them all.
def process_item(self, item, spider):
# Adding the item to the list
add_to_products_buffer(item)
# Check if the length is 100
if get_products_buffer_len() == 100:
# Get The list to loop on it
products_list = get_products_buffer()
for item in products_list:
# The insert query
self.cursor.execute('insert query')
try:
# Commit to DB the insertions quires
self.conn.commit()
# Emty the list
empty_products_buffer()
except Exception, e:
# Except the error
Also you can use executemany
if you don't want to loop.