Search code examples
python-3.xscrapyscrapy-pipeline

Access Instance of scrapy pipeline class


I want to access the variable self.cursor to make use of the active postgreSQL connection, but i am unable to figure out how to access the scrapy's instance of the pipeline class.

class ScrapenewsPipeline(object):

  def open_spider(self, spider):
      self.connection = psycopg2.connect(
        host= os.environ['HOST_NAME'],
        user=os.environ['USERNAME'],
        database=os.environ['DATABASE_NAME'],
        password=os.environ['PASSWORD'])
      self.cursor = self.connection.cursor()
      self.connection.set_session(autocommit=True)


  def close_spider(self, spider):
      self.cursor.close()
      self.connection.close() 


  def process_item(self, item, spider):
      print ("Some Magic Happens Here")


  def checkUrlExist(self, item):
      print("I want to call this function from my spider to access the 
    self.cursor variable")

Please note, i realise i can get access to process_item by using yield item but that function is doing other stuff and i want access of the connection via self.cursor in checkUrlExist and be able to call the instance of class from my spiders at will! Thank you.


Solution

  • You can access all of your spider class variables by doing spider.variable_name here.

    class MySpider(scrapy.Spider):
            name = "myspider"
            any_variable = "any_value"
    

    Your pipeline here

    class MyPipeline(object):
        def process_item(self, item, spider):
            spider.any_variable
    

    I suggest you to create a connection in your Spider class just like I declared any_variable in my example, that will be accessible in your Spider using self.any_variable and in your pipelines, it will be accessible via spider.any_variable