I have setup celery + rabbitmq for on a 3 cluster machine. I have also created a task which generates a regular expression based on data from the file and uses the information to parse text. However, I would like that the process of reading the file is done only once per worker spawn and not on every execution of as task.
from celery import Celery
celery = Celery('tasks', broker='amqp://localhost//')
import re
@celery.task
def add(x, y):
return x + y
def get_regular_expression():
with open("text") as fp:
data = fp.readlines()
str_re = "|".join([x.split()[2] for x in data ])
return str_re
@celery.task
def analyse_json(tw):
str_re = get_regular_expression()
re.match(str_re,tw.text)
In the above code, I would like to open the file and read the output into the string only once per worker, and then the task analyse_json should just use the string.
Any help will be appreciated,
thanks, Amit
Put the call to get_regular_expression
at the module level:
str_re = get_regular_expression()
@celery.task
def analyse_json(tw):
re.match(str_re, tw.text)
It will only be called once, when the module is first imported.
Additionally, if you must have only one instance of your worker running at a time (for example CUDA), you have to use the -P solo option:
celery worker --pool solo
Works with celery 4.4.2.