I am encountering the following error when I use df.loc
. I have a parameter s which is datetime64[ns], I have dataframe with two of the column MDT1 and MTD2, which is also datetime64[ns] type. When I use the following statement using df.loc
s= df_1x2['date'].astype (str)+'-'+ df_1x2['Time'].astype(str)
s = pd.to_datetime(s)
df_1x2.loc[(s>df_1x2['MDT1'])&(s<df_1x2['MDT2']),'Morning']= s
The last line of the code shows me the following error in Jupyter Notebook
ValueError: Can only compare identically-labeled Series objects
However, when I run second time, it works as expected. It was just that the first run in Jupyter Notebook got me the error.
Below is the link to download the sample data file:
Click here to download the sample data file
Below is the code:
s = pd.to_datetime(df['Date']).astype(str)+' '+df['Time.1'].astype(str)+':00'
df['Matchdate'] = pd.to_datetime (s)
d =np.reciprocal(df_1x2[['Home','Draw','Away']].astype(float))
df_1x2['Margin'] = d.sum(axis = 1)-1
df_1x2['Open_DT'] = df_1x2['date'].astype (str)+'-'+ df_1x2['Time'].astype(str)
df_1x2['Open_DT'] = pd.to_datetime(df_1x2['Open_DT'])
df_1x2['Open_DT1'] = df_1x2.groupby(['Match','Odds_Type'])['Open_DT'].transform(min)
s = df_1x2['Open_DT'] == df_1x2['Open_DT1']
df_1x2.loc[s,'Open'] ='Opening'
df_1x2['Current_DT'] = df_1x2['date'].astype (str)+'-'+ df_1x2['Time'].astype(str)
df_1x2['Current_DT'] = pd.to_datetime(df_1x2['Current_DT'])
df_1x2['Current_DT1'] = df_1x2.groupby(['Match','Odds_Type'])['Current_DT'].transform(max)
sc = df_1x2['Current_DT'] == df_1x2['Current_DT1']
df_1x2.loc[sc,'Current'] ='Current'
df_1x2.loc[df_1x2.Open == 'Opening','Current']='Opening'
df_1x2 ['morning Date'] = pd.Timestamp.today()
Time1 = '08:00:00'
Time2 = '11:00:00'
s = df_1x2['date'].astype (str)+'-'+ df_1x2['Time'].astype(str)
s = pd.to_datetime(s)
df_1x2.reset_index(drop = True, inplace = True)
df_1x2 ['morning Date'] = pd.to_datetime(df_1x2 ['morning Date']).dt.date
df_1x2['MDT1'] = pd.to_datetime(df_1x2['morning Date'].astype(str)+' '+Time1)
df_1x2['MDT2'] = pd.to_datetime(df_1x2['morning Date'].astype(str)+' '+Time2)
df_1x2 .reset_index(drop= True, inplace = True)
df_1x2.loc[(s>df_1x2['MDT1'])&(s<df_1x2['MDT2']),'Morning']= s
How can I rectify it?
You are applying an entire series s
to a filtered dataframe column. I think it is easier to just use mask
or np.where
instead of .loc
:
df_1x2['Morning'] = df_1x2['Morning'].mask((s > df_1x2['MDT1']) & (s < df_1x2['MDT2']), s)
or np.where
:
df_1x2['Morning'] = np.where((s > df_1x2['MDT1']) & (s < df_1x2['MDT2']), s, df_1x2['Morning'])
If the column doesn't exist yet, then use:
df_1x2['Morning'] = np.where((s > df_1x2['MDT1']) & (s < df_1x2['MDT2']), s, np.datetime64('NaT'))
.loc
would be okay for this use case if you were assigning a single value like 10
rather than a series that has a longer length than your filtered dataframe.
The issue is that you reset_index()
after defining s
, which means it is not comparable to your dataframe:
s = pd.to_datetime(df['Date']).astype(str)+' '+df['Time.1'].astype(str)+':00'
df['Matchdate'] = pd.to_datetime (s)
d =np.reciprocal(df_1x2[['Home','Draw','Away']].astype(float))
df_1x2['Margin'] = d.sum(axis = 1)-1
df_1x2['Open_DT'] = df_1x2['date'].astype (str)+'-'+ df_1x2['Time'].astype(str)
df_1x2['Open_DT'] = pd.to_datetime(df_1x2['Open_DT'])
df_1x2['Open_DT1'] = df_1x2.groupby(['Match','Odds_Type'])['Open_DT'].transform(min)
s = df_1x2['Open_DT'] == df_1x2['Open_DT1']
df_1x2.loc[s,'Open'] ='Opening'
df_1x2['Current_DT'] = df_1x2['date'].astype (str)+'-'+ df_1x2['Time'].astype(str)
df_1x2['Current_DT'] = pd.to_datetime(df_1x2['Current_DT'])
df_1x2['Current_DT1'] = df_1x2.groupby(['Match','Odds_Type'])['Current_DT'].transform(max)
sc = df_1x2['Current_DT'] == df_1x2['Current_DT1']
df_1x2.loc[sc,'Current'] ='Current'
df_1x2.loc[df_1x2.Open == 'Opening','Current']='Opening'
df_1x2 ['morning Date'] = pd.Timestamp.today()
Time1 = '08:00:00'
Time2 = '11:00:00'
########### I moved s from here...........
df_1x2.reset_index(drop = True, inplace = True)
df_1x2 ['morning Date'] = pd.to_datetime(df_1x2 ['morning Date']).dt.date
df_1x2['MDT1'] = pd.to_datetime(df_1x2['morning Date'].astype(str)+' '+Time1)
df_1x2['MDT2'] = pd.to_datetime(df_1x2['morning Date'].astype(str)+' '+Time2)
df_1x2 .reset_index(drop= True, inplace = True)
########### .........to here.
##### Resetting the index after defining s, means you can no longer compare directly to your dataframe as s has a different index than your dataframe now.
s = df_1x2['date'].astype (str)+'-'+ df_1x2['Time'].astype(str)
s = pd.to_datetime(s)
df_1x2.loc[(s>df_1x2['MDT1'])&(s<df_1x2['MDT2']),'Morning']= s