I have a dataset named df
Multiplicating two columns (Open and low, which have the equal values) should give the result nearly equal to the highest integer (decimal) value referring to the 'High' column & store the result in High_new with its corresponding date.
For example in the table attached -
Open and High both columns have equal value (100) so 100*2 = 200 and 200.8 is the highest decimal(integer) value which is closest to 200, so we get 200.8 in High_new column and its corresponding date (refer output dataset) (** Note - we haven't considered 200.3 since 200.8 is highest decimal value of 200)
So, how do I get column High_new and date_new column? (Can be solved using python/SQL)
I have tried below code so far (sharing the output of this too)
** (This is just the initial logic which might work) -
import pandas as pd
import numpy as np
import re
data = {'Date':['11/1/18', '12/3/18', '1/1/19', '2/1/19', '3/1/19'],
'Open':[72, 99, 101.25, 92.05, 100],
'High':[115.4, 144.3, 200.8, 200.3, 135.5],
'Low' :[72,91,88.3,84,100]}
# Create DataFrame
df = pd.DataFrame(data)
# Print the output.
df
df['high2'] = np.where(df['Open'] == df['Low'], df['Open']*2, np.nan)
I find the "largest decimal" thing to be unusual and likely to be a misunderstanding, but this does what you ask. I create a separate dataframe to keep the largest fraction within each integer, and then do a row-by-row to look up the closest.
import pandas as pd
data = {'Date':['11/1/18', '12/3/18', '1/1/19', '2/1/19', '3/1/19'],
'Open':[72, 99, 101.25, 92.05, 100],
'High':[115.4, 144.3, 200.8, 200.3, 135.5],
'Low' :[72,91,88.3,84,100]}
# Create DataFrame.
df = pd.DataFrame(data)
# Create a dataframe with only the highest for each integer.
sub = df[['Date','High']].sort_values(by='High')
sub['IntHigh'] = df['High'].astype(int)
sub.drop_duplicates('IntHigh',keep='last', inplace=True)
# Find closest high for each row where open==low.
def findclosest(row):
if row['Open'] != row['Low']:
return row['High'],row['Date']
sub['Delta'] = (sub['IntHigh']-row['Open']*2).abs()
chk = sub.sort_values('Delta').index[0]
return sub.loc[chk,['High','Date']]
df[['High2','Date2']] = df.apply( findclosest, axis=1 )
print(df)
Output:
Date Open High Low High2 Date2
0 11/1/18 72.00 115.4 72.0 144.3 12/3/18
1 12/3/18 99.00 144.3 91.0 144.3 12/3/18
2 1/1/19 101.25 200.8 88.3 200.8 1/1/19
3 2/1/19 92.05 200.3 84.0 200.3 2/1/19
4 3/1/19 100.00 135.5 100.0 200.8 1/1/19