Search code examples

Access Instance of scrapy pipeline class

I want to access the variable self.cursor to make use of the active postgreSQL connection, but i am unable to figure out how to access the scrapy's instance of the pipeline class.

class ScrapenewsPipeline(object):

  def open_spider(self, spider):
      self.connection = psycopg2.connect(
        host= os.environ['HOST_NAME'],
      self.cursor = self.connection.cursor()

  def close_spider(self, spider):

  def process_item(self, item, spider):
      print ("Some Magic Happens Here")

  def checkUrlExist(self, item):
      print("I want to call this function from my spider to access the 
    self.cursor variable")

Please note, i realise i can get access to process_item by using yield item but that function is doing other stuff and i want access of the connection via self.cursor in checkUrlExist and be able to call the instance of class from my spiders at will! Thank you.


  • You can access all of your spider class variables by doing spider.variable_name here.

    class MySpider(scrapy.Spider):
            name = "myspider"
            any_variable = "any_value"

    Your pipeline here

    class MyPipeline(object):
        def process_item(self, item, spider):

    I suggest you to create a connection in your Spider class just like I declared any_variable in my example, that will be accessible in your Spider using self.any_variable and in your pipelines, it will be accessible via spider.any_variable