Search code examples
pythonpandasgetattr

How do you slice a pandas dataframe as an argument in a function?


What I am looking to do is to put the rules of slicing a pandas dataframe in a function.

For example:

row1 = {'a':5,'b':6,'c':7,'d':'A'}
row2 = {'a':8,'b':9,'c':10,'d':'B'}
row3 = {'a':11,'b':12,'c':13,'d':'C'}
df = pd.DataFrame([row1,row2,row3])

I am slicing the dataframe this way:

print df.loc[df['a']==5]
print df.loc[df['b']==12]
print df.loc[(df['b']==12) | df['d'].isin(['A','C']),'d']

For my purposes, I need to slice the same dataframe in different ways as part of a function. For example:

def slicing(locationargument):
    df.loc(locationargument)
    do some stuff..
    return something

Alternatively, I was expecting getattr() to work but that tells me DataFrames do not have a .loc[...] attribute. For example:

getattr(df,"loc[df['a']==5]")

Returns:

AttributeError: 'DataFrame' object has no attribute 'loc[df['a']==5]'

Am I missing something here? Any thoughts or alternatives would be greatly appreciated!


Solution

  • In Pandas, I believe it's not quite right to think of .loc as a function (or method) on a DataFrame. For example, the syntax df.loc(...) is not right. Instead, you need to write df.loc[...] (brackets, not parentheses).

    So how about simply:

    def slicing(locationargument):
        df.loc[locationargument]
        do some stuff..
        return something
    

    But then the question becomes "what type of object should locationargument be? If it's an iterable whose length is equal to the number of rows in your data frame, you're all set. An alternative could be to make it a string and then write:

    def slicing(locationargumentstring):
        df.loc[eval(locationargumentstring)]
        do some stuff..
        return something
    

    If you go the getattr route, remember that the attribute doesn't include parameters. So the following is bad:

    getattr(df, "loc[df['a']==5]")
    

    but the following would work:

    getattr(df, "loc")[eval("df['a']==5")]
    

    and, more directly, so would

    getattr(df, "loc")[df['a']==5]