Search code examples
pythonpandasdataframeindexingglobal-variables

python / pandas - MultiIndexing - eliminate the use of global variables


I am using pandas to import a dataframe from excel in order to sort, make changes and run some simple addition and division on the data.

My code is working but it has global variables throughout. I think this is poor practice and I want to somehow eliminate these global variables but I am confused on how I can go about doing this.

I'm not sure how I can further modify my dataframe with indexing and slicing without declaring global variables.

mydf = pd.read_excel('data.xlsx')

new_indexes = df.set_index(['apple', 'cherry', 'banana'])

new_indexes['apples and cherries'] = new_indexes['apple'] + new_indexes['cherries']

sliced = multi.loc(axis = 0)[pd.IndexSlice[:, 'fruits']]

total_fruits = sliced.loc[:, 'grapes', 'watermelon', 'orange'].sum(axis=1)

That's a snippet of my code. As you can see I am referring to the global variables in order to further modify my dataframe. I need to eliminate the global variables. I am trying to create functions to help clean up my code.

My main question is how can I refer to my data and changes without assigning global variables to my code?

If I wanted to go about defining a class and reassigning the variables to properties would I be able to do something like this?

class MyDf:

    def __init__(self):
        pass

    def get_df(self):
        return pd.read_excel('data.xlsx')
    
    def set_index(self):
        self._multi_index = df.set_index(['apple', 'cherry', 'banana']) 

    def add_totals(self)
        self.set_indexes['apples and cherries'] = set_indexes['apple']+ new_indexes['cherries']

 

Thank you


Solution

  • There are several things you could do, dependent on the overall structure of your code and your goal. Without knowing more about your case and, for example, seeing how the snippet you provided is embedded into the rest of your code, those are only possible solutions.

    You could define a function, make it take a dataframe as an argument, perform operations on it and then return the modified dataframe. The function could also simply take a filename as argument, so that the respective df is created within the function to begin with. If you do not need to refer to intermediary variables such as new_indexes or sliced later in the code, using a function to perform the operations might be a good way to go.

    You could also define a Class, make the variables into properties of objects of that class and write methods to perform the respective operations you want to do. This would have the advantage that you could still access your variables, if necessary.