Search code examples
pythonpython-3.xpandaslambdaapply

Pandas - Lambda inside apply to return a row


I was expecting to get whole rows when using lambda function inside a apply in Pandas DataFrame, but it looks I'm getting a "single element".

Look that code:

# Data sample
reviews_2 = pd.DataFrame({
    'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0}, 
    'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'}, 
    'points': {0: 87, 1: 87, 2: 87, 3: 87}
})

print(reviews_2)

mean_price_2 = reviews_2.price.mean() # a value to centering

def remean_points(row):
    row.price = row.price - mean_price_2
    return row

centered_price_2 = reviews_2.apply(remean_points, axis='columns') # returns a DataFrame

print(centered_price_2)

That "apply" returns a DataFrame. That is my expected output!

So, I tried to use a lambda function, doing:

reviews_2 = pd.DataFrame({
    'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0}, 
    'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'}, 
    'points': {0: 87, 1: 87, 2: 87, 3: 87}
})
print(reviews_2)

mean_price_2 = reviews_2.price.mean()

centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # returns a Serie!

print(centered_price_2)

But now, "apply" returns a Serie!

I know the apply tries to identify the type.
I was waiting to get a row, but it looks to return a "single element"...

So my question:

p in the lambda function should not be a row?

Interesting:

If I do centered_price_2 = reviews_2.apply(lambda p: p, axis='columns'),
I get a DataFrame...

Yet:

How to use lambda and apply functions and to be sure about output type?!


Solution

  • This question was done in 2020, and now, in 2024, reviewing my open questions I understand Pandas a bit more (just a bit)!

    So...

    My mistake was here:

    mean_price_2 = reviews_2.price.mean()
    
    centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # returns a Serie!
    

    I explain:

    1. Like I said in that time, apply tries to identify the used type.
    2. mean_price_2 = reviews_2.price.mean() is a Serie.
    3. So, even p been a whole DataFrame, my lambda function expression centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') also returns a Serie!
    4. Because, p.price - mean_price_2 returns a Serie.

    In 2020, I wrongly did think lambda p:... should always return a DataFrame since p is a DataFrame. The lambda returned type commes from the evaluated expression...

    One solution to fix my code would be:

    reviews_2 = pd.DataFrame({
        'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0}, 
        'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'}, 
        'points': {0: 87, 1: 87, 2: 87, 3: 87}
    })
    
    print(reviews_2)
    
    mean_price_2 = reviews_2.price.mean()
    
    # note the next two lines
    centered_price_2 = reviews_2 # 'Copy' the DataFrame
    centered_price_2.price = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # Only change the desired column!
    
    print(centered_price_2)
    

    Happy 2024!