Search code examples
pythonmultithreadingtwistedscrapymysql-python

Scrapy mysql pipeline critical section in runInteraction()


I need to help with fixing critical section in my pipeline scrapy code.

I am using this MySQL pipeline in scrapy (from http://snippets.scrapy.org/snippets/33/):

class SQLStorePipeline(object):

def __init__(self):
    self.dbpool = adbapi.ConnectionPool('MySQLdb', db='mydb',
            user='myuser', passwd='mypass', cursorclass=MySQLdb.cursors.DictCursor,
            charset='utf8', use_unicode=True)

def process_item(self, item, spider):
    # run db query in thread pool
    query = self.dbpool.runInteraction(self._conditional_insert, item)
    query.addErrback(self.handle_error)

    return item

def _conditional_insert(self, tx, item):
    # create record if doesn't exist. 
    # all this block run on it's own thread


    # START CRITICAL SECTION
    some_critical_code_here
    # STOP CRITICAL SECTION


    tx.execute("select * from websites where link = %s", (item['link'][0], ))
    result = tx.fetchone()
    if result:
        log.msg("Item already stored in db: %s" % item, level=log.DEBUG)
    else:
        tx.execute(\
            "insert into websites (link, created) "
            "values (%s, %s)",
            (item['link'][0],
             datetime.datetime.now())
        )
        log.msg("Item stored in db: %s" % item, level=log.DEBUG)

def handle_error(self, e):
    log.err(e)

Everything works just fine.

As you can see I already know where is my critical section in code. But I am really new to python and dont know how to use some locks or something like that to prevent enter in the crical section by more threads than one.

Can you please help me? If you can send me the code for enter and leave the critical section that I can use in this code it will be great.

Thx guys.


Solution

  • Anyway I sort it out by merging the sql statments in the critical section thx to guy with nick toothrot on scrapy IRC for the idea