Search code examples
pythonpandasfunctionreturn

Why are my variable not accessible after a function?


I can't figure out why my function isn't providing the changes to the variables after I execute the function. Or why the variables are accessible after the function. I'm provided a dataframe and telling the fucntion the column to compare. I want the function to include the matching values are the original dataframe and create a separate dataframe that I can see just the matches. When I run the code I can see the dataframe and matching dataframe after running the function, but when I tried to call the matching dataframe after python doesn't recognize the variable as define and the original dataframe isn't modified when I look at it again. I've tried to call them both as global variables at the beginning of the function, but that didn't work either.

def scorer_tester_function(dataframe, score_type, source, compare, limit_num):
    match = []
    match_index = []
    similarity = []
    org_index = []
    match_df = pd.DataFrame()

    for i in zip(source.index, source):
        position = list(source.index)
        print(str(position.index(i[0])) + " of " + str(len(position)))
        if pd.isnull(i[1]):
            org_index.append(i[0])
            match.append(np.nan)
            similarity.append(np.nan)
            match_index.append(np.nan)
        else:
            ratio = process.extract( i[1], compare, limit=limit_num,
                                     scorer=scorer_dict[score_type])
            org_index.append(i[0])
            match.append(ratio[0][0])
            similarity.append(ratio[0][1])
            match_index.append(ratio[0][2])
    match_df['org_index'] = pd.Series(org_index)
    match_df['match'] = pd.Series(match)
    match_df['match_index'] = pd.Series(match_index)
    match_df['match_score'] = pd.Series(similarity)
    match_df.set_index('org_index', inplace=True)
    dataframe = pd.concat([dataframe, match_df], axis=1)
    return match_df, dataframe

I'm calling the function list this:

scorer_tester_function(df_ven, 'WR', df_ven['Name 1'].sample(2), df_emp['Name 2'], 1)

My expectation is that I can access match_df and def_ven and I would be able to see and further manipulate these variables, but when called the original dataframe df_ven is unchanged and match_df returns a variable not defined error.


Solution

  • return doesn't inject local variables into the caller's scope; it makes the function call evaluate to their values.

    If you write

    a, b = scorer_tester_function(df_ven, 'WR', df_ven['Name 1'].sample(2), df_emp['Name 2'], 1)
    

    then a will have the value of match_df from inside the function and b will have the value of dataframe, but the names match_df and dataframe go out of scope after the function returns; they do not exist outside of it.