I am new to Python and would like to assign a value based on mathematical operations, eg "right" if >, "left" if <, "equal" if ==, within a list comprehension.
I have tried the below, but it throws an error. Can multiple conditions be specified in a single list comprehension in this way, where each "elif" generates a different output, or will I need to use a loop?
Fully reproduceable example:
from sklearn.datasets import load_iris
bunch = load_iris(as_frame=True)
df = bunch.data.reset_index().rename(columns={"index": "id"}).merge(bunch.target.reset_index().rename(columns={"index": "id"})).drop(["id"], axis=1)
# question is in last row, "skew"
datasummary_dct = {
"50%": [df[col].median().round(2) if any(t in str(df[col].dtype) for t in ("float", "int", "time")) else " " for col in df.columns],
"mean": [df[col].mean().round(2) if any(t in str(df[col].dtype) for t in ("float", "int", "time")) else " " for col in df.columns],
"skew": ["left" if df[col].median() > df[col].mean() else "right" if df[col].median() < df[col].mean() else "equal" if df[col].median()==df[col].mean() if any(t in str(df[col].dtype) for t in ("float", "int", "time")) else " " for col in df.columns],
}
again I am still fairly new to programming; apologies if I do not immediately understand the solution. any guidance is appreciated!
Instead of the complex nested if
, you can use np.select
for this which is much more readable:
datasummary_dct = {
"skew": [
np.select(
[df[col].median() > df[col].mean(), df[col].median() < df[col].mean()],
["right", "left"],
"equal",
)
if any(t in str(df[col].dtype) for t in ("float", "int", "time"))
else " "
for col in df.columns
],
}
print(pd.DataFrame(datasummary_dct))
Output:
skew
0 left
1 left
2 right
3 right
4 equal