I was expecting to get whole rows when using lambda function inside a apply in Pandas DataFrame, but it looks I'm getting a "single element".
Look that code:
# Data sample
reviews_2 = pd.DataFrame({
'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0},
'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'},
'points': {0: 87, 1: 87, 2: 87, 3: 87}
})
print(reviews_2)
mean_price_2 = reviews_2.price.mean() # a value to centering
def remean_points(row):
row.price = row.price - mean_price_2
return row
centered_price_2 = reviews_2.apply(remean_points, axis='columns') # returns a DataFrame
print(centered_price_2)
That "apply" returns a DataFrame. That is my expected output!
So, I tried to use a lambda function, doing:
reviews_2 = pd.DataFrame({
'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0},
'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'},
'points': {0: 87, 1: 87, 2: 87, 3: 87}
})
print(reviews_2)
mean_price_2 = reviews_2.price.mean()
centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # returns a Serie!
print(centered_price_2)
But now, "apply" returns a Serie!
I know the apply
tries to identify the type.
I was waiting to get a row, but it looks to return a "single element"...
So my question:
p
in the lambda function should not be a row?
Interesting:
If I do
centered_price_2 = reviews_2.apply(lambda p: p, axis='columns')
,
I get a DataFrame...
Yet:
How to use
lambda
andapply
functions and to be sure about output type?!
This question was done in 2020, and now, in 2024, reviewing my open questions I understand Pandas a bit more (just a bit)!
So...
My mistake was here:
mean_price_2 = reviews_2.price.mean()
centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # returns a Serie!
I explain:
apply
tries to identify the used type.mean_price_2 = reviews_2.price.mean()
is a Serie
.p
been a whole DataFrame
, my lambda function expression centered_price_2 = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns')
also returns a Serie
!p.price - mean_price_2
returns a Serie
.In 2020, I wrongly did think lambda p:...
should always return a DataFrame
since p
is a DataFrame
.
The lambda
returned type commes from the evaluated expression...
One solution to fix my code would be:
reviews_2 = pd.DataFrame({
'price': {0: None, 1: 15.0, 2: 14.0, 3: 13.0},
'country': {0: 'Italy', 1: 'Portugal', 2: 'US', 3: 'US'},
'points': {0: 87, 1: 87, 2: 87, 3: 87}
})
print(reviews_2)
mean_price_2 = reviews_2.price.mean()
# note the next two lines
centered_price_2 = reviews_2 # 'Copy' the DataFrame
centered_price_2.price = reviews_2.apply(lambda p: p.price - mean_price_2, axis='columns') # Only change the desired column!
print(centered_price_2)
Happy 2024!