Search code examples
pythonpandasdataframegroup-bygrouping

Can .apply use information from other groups?


For each element in a group determine if it is present in the next group (in order as these groups appear - not necessarily numerical). For the last group - all False.

Example:

df = pd.DataFrame({'group': [ 0,   1,   1,   0,   2 ], 
                     'val': ['a', 'b', 'a', 'c', 'c']})
grouped = df.groupby('group')
print(result)
0     True
1    False
2    False
3    False
4    False
Name: val, dtype: bool

What is the best way to do it? I can accomplish it like this, but it seems too hacky:

keys = list(grouped.groups.keys())

iterator_keys = iter(keys[1:])
def f(ser):
    if ser.name == keys[-1]:
        return ser.isin([])
    next_key = next(iterator_keys)
    return ser.isin(grouped.get_group(next_key)['val'])
result = grouped['val'].apply(f)

Solution

  • Try:

    g = df.groupby("group")
    
    m = g["val"].agg(set).shift(-1, fill_value=set())
    x = g["val"].transform(lambda x: x.isin(m[x.name]))
    print(x)
    

    Prints:

    0     True
    1    False
    2    False
    3    False
    4    False
    Name: val, dtype: bool
    

    Note:

    If you want to replace values of the last group with any values (not necessarily with False), you can do this:

    m = g["val"].agg(set).shift(-1)
    x = g["val"].transform(lambda x: x.isin(m[x.name])
                                     if not pd.isnull(m[x.name])
                                     else values)
    

    For example, if you set values = True, the x will be:

    0     True
    1    False
    2    False
    3    False
    4     True
    Name: val, dtype: bool