I want to access the variable self.cursor
to make use of the active postgreSQL connection, but i am unable to figure out how to access the scrapy's instance of the pipeline class.
class ScrapenewsPipeline(object):
def open_spider(self, spider):
self.connection = psycopg2.connect(
host= os.environ['HOST_NAME'],
user=os.environ['USERNAME'],
database=os.environ['DATABASE_NAME'],
password=os.environ['PASSWORD'])
self.cursor = self.connection.cursor()
self.connection.set_session(autocommit=True)
def close_spider(self, spider):
self.cursor.close()
self.connection.close()
def process_item(self, item, spider):
print ("Some Magic Happens Here")
def checkUrlExist(self, item):
print("I want to call this function from my spider to access the
self.cursor variable")
Please note, i realise i can get access to process_item
by using yield item
but that function is doing other stuff and i want access of the connection via self.cursor
in checkUrlExist
and be able to call the instance of class from my spiders at will!
Thank you.
You can access all of your spider class variables by doing spider.variable_name
here.
class MySpider(scrapy.Spider):
name = "myspider"
any_variable = "any_value"
Your pipeline here
class MyPipeline(object):
def process_item(self, item, spider):
spider.any_variable
I suggest you to create a connection in your Spider class just like I declared any_variable
in my example, that will be accessible in your Spider using self.any_variable
and in your pipelines, it will be accessible via spider.any_variable