Search code examples
pythonpython-multiprocessing

Python: How to use class methods in multiprocessing?


I have the following class methods to parse an individual URL:

product = Product(links[0], user_agents)
result = product.parse()

and class code:

class Product:
    soup = None
    url = None

    def __init__(self, url, user_agents):
        self.url = url
        print('Class Initiated with URL: {}'.format(url))
        # Randomize the user agent
        user_agent = get_random_user_agent(user_agents)
        user_agent = user_agent.rstrip('\n')

        if 'linux' in user_agent.lower():
            sec_ch_ua_platform = 'Linux'
        elif 'mac os x' in user_agent.lower():
            sec_ch_ua_platform = 'macOS'
        else:
            sec_ch_ua_platform = 'Windows'

        headers = {
            
        }
        r = create_request(url, None, headers=headers, is_proxy=False)
        if r is None:
            raise ValueError('Could not get data')
        html = r.text.strip()
        self.soup = BeautifulSoup(html, 'lxml')

    def parse(self):
        record = {}
        name = ''
        price = 0
        user_count_in_cart = 0
        review_count = 0
        rating = 0
        is_personalized = 'no'

        try:
            name = self.get_name()
            price = self.get_price()
            is_pick = self.get_is_pick()

Now I want to call parse() in multiprocessing. How do I do it? For a single record I am doing like this:

product = Product(links[0], user_agents)
result = product.parse()

Solution

  • With currecnt class you may need to create function which gets url and it creates product = Product(url,...) and it runs product.parse() - and this new function you can use .map(new_function, links)

    Something like this:

    def check(url):
        product = Product(url, user_agents)
        result = product.parse()
        return result
    
    for multiprocessing.pool.Pool() as p:
        results = p.map(check, links)