I've written this script that create new columns based on a value meeting two conditions.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df=pd.DataFrame()
df['variable 1']= np.arange(0,1.1,0.1)
df['variable 2']= 0.2*df['variable 1']
df['variable 3']= 0.4 -0.2*df['variable 1']
# Create new columns
slope = [2, 1.5, 1, 0.5]
for i in range(len(slope)):
df['slope = ' + str(slope[i])]=''
for j in range(len(df['variable 1'])):
# Calculating Scl_disp_sd with equation 1
curve = 0.5 - slope[i]*df['variable 1'][j]
df['slope = ' + str(slope[i])][j]= np.where((curve>df['variable 2'][j]) & (curve<df['variable 3'][j]), curve,np.nan)
display(df)
plt.plot(df['variable 1'], df['variable 2'], 'o', label='variable 2')
plt.plot(df['variable 1'], df['variable 3'], 'o', label='variable 3')
plt.plot(df['variable 1'], df.filter(like='slope =', axis=1), marker='.')
plt.legend()
The script works, however, I get this message:
/var/folders/m0/_y1fs5x50xx99pjg2yf42y7r0000gp/T/ipykernel_1964/2618301266.py:11: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:
df["col"][row_indexer] = value
Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['slope = ' + str(slope[i])][j]= np.where((curve>df['variable 2'][j]) & (curve<df['variable 3'][j]),
/var/folders/m0/_y1fs5x50xx99pjg2yf42y7r0000gp/T/ipykernel_1964/2618301266.py:11: SettingWithCopyWarning:
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['slope = ' + str(slope[i])][j]= np.where((curve>df['variable 2'][j]) & (curve<df['variable 3'][j]),
/var/folders/m0/_y1fs5x50xx99pjg2yf42y7r0000gp/T/ipykernel_1964/2618301266.py:11: FutureWarning: ChainedAssignmentError: behaviour will change in pandas 3.0!
You are setting values through chained assignment. Currently this works in certain cases, but when using Copy-on-Write (which will become the default behaviour in pandas 3.0) this will never work to update the original DataFrame or Series, because the intermediate object on which we are setting values will behave as a copy.
A typical example is when you are setting values in a column of a DataFrame, like:
df["col"][row_indexer] = value
Use `df.loc[row_indexer, "col"] = values` instead, to perform the assignment in a single step and ensure this keeps updating the original `df`.
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
...
A value is trying to be set on a copy of a slice from a DataFrame
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
df['slope = ' + str(slope[i])][j]= np.where((curve>df['variable 2'][j]) & (curve<df['variable 3'][j]),
Output is truncated. View as a scrollable element or open in a text editor. Adjust cell output settings...
I'd appreciate if someone would have another idea how to write this script in order to avoid the message
No need for the nested loop. Just apply you operation as a vector:
slope = [2, 1.5, 1, 0.5]
for i in range(len(slope)):
curve = 0.5 - slope[i]*df['variable 1']
df['slope = ' + str(slope[i])] = np.where((curve>df['variable 2'])
& (curve<df['variable 3']),
curve,np.nan)
Or full vectorial with numpy:
curve = 0.5 - slope*df['variable 1'].to_numpy()[:, None]
cols = [f'slope = {c}' for c in slope]
df[cols] = np.where( (curve > df[['variable 2']].to_numpy())
& (curve < df[['variable 3']].to_numpy()),
curve, np.nan)
Output:
variable 1 variable 2 variable 3 slope = 2 slope = 1.5 slope = 1 slope = 0.5
0 0.0 0.00 0.40 NaN NaN NaN NaN
1 0.1 0.02 0.38 0.3 0.35 NaN NaN
2 0.2 0.04 0.36 0.1 0.20 0.3 NaN
3 0.3 0.06 0.34 NaN NaN 0.2 NaN
4 0.4 0.08 0.32 NaN NaN 0.1 0.30
5 0.5 0.10 0.30 NaN NaN NaN 0.25
6 0.6 0.12 0.28 NaN NaN NaN 0.20
7 0.7 0.14 0.26 NaN NaN NaN 0.15
8 0.8 0.16 0.24 NaN NaN NaN NaN
9 0.9 0.18 0.22 NaN NaN NaN NaN
10 1.0 0.20 0.20 NaN NaN NaN NaN