Search code examples
pythonpandasdataframemultiprocessingpython-multiprocessing

Python Pandas multiprocessing no result return


I have a df,you can have it by copy paste:

import pandas as pd
from io import StringIO

df = """
  ValOption  RB test 
0       SLA  4  3       
1       AC   5  4           
2       SLA  5  5          
3       AC   2  4       
4       SLA  5  5         
5       AC   3  4          
6       SLA  4  3         

"""
df = pd.read_csv(StringIO(df.strip()), sep='\s+')

Output:

ValOption   RB  test
0     SLA   4   3
1     AC    5   4
2     SLA   5   5
3     AC    2   4
4     SLA   5   5
5     AC    3   4
6     SLA   4   3

Then I have 2 functions to build new columns for this df:

def func1():
    df['r1']=df['test']+1
    return df['r1']

def func2():
    df['r2']=df['RB']+1
    return df['r2']

After I call these 2 functions:

func1()
func2()

Output:

ValOption   RB  test    r1  r2
0    SLA    4   3      4    5
1     AC    5   4      5    6
2     SLA   5   5      6    6
3     AC    2   4      5    3
4     SLA   5   5      6    6
5     AC    3   4      5    4
6     SLA   4   3      4    5

But when I tried to use multiprocessing I can't get the new columns:

import multiprocessing
if __name__ ==  '__main__':

    p1 = multiprocessing.Process(target=func1)
    p2 = multiprocessing.Process(target=func2)

    p1.start()
    p2.start()

    p1.join()
    p2.join()

Output:

ValOption   RB  test
0    SLA    4   3
1     AC    5   4
2    SLA    5   5
3     AC    2   4
4    SLA    5   5
5     AC    3   4
6    SLA    4   3

The multiprocessing didn't return the values in the functions .Any friend can help?


Solution

  • ok, then change your code by creating a class :

    from multiprocessing import Process
    
    class Test:
        def __init__(self, df):
            self.df = df
            
        def func1(self):
            df['r1'] = df['test']+1
    
        def func2(self):
            df['r2'] = df['RB']+1
    
    p1 = Process(target=Test(df).func1())
    p2 = Process(target=Test(df).func2())
    
    p1.start()
    p2.start()
    
    p1.join()
    p2.join()
    

    This should work, for sure