Hi guys i started at jupyter notebook few days ago.
I need help, i have a dataframe by panda. something like this
Date Stock Company Volume
01/02 APPL3 Apple 1.000.000
01/02 YUSS Yusduqs 200.000
01/02 APPL4 Apple 200.000
01/02 DISN Disney 1.500.000
02/02 APPL3 Apple 100.000
02/02 YUSS Yusduqs 1.250.000
02/02 DISN Disney 2.000.000
02/02 APPL4 Apple 1.250.000
... ... ....
I need to select the stock that was traded in more than 80% of the days with volume greater than $ 500.000,00 per day.
And i need to select **only one stock per firm, the criterio is which has more volume in all days combined. Like for 'Apple' in [Company] i have two diferents [Stock] Appl3 and Appl4, in this specific case i only need APPL4.
(Because Volume of the days combined in Appl4 > Volume of the days combined in Appl3)
I started like this:
unique_dates=len(df['Date'].value_counts())
share_freq=df[df['Volume']>=500000]]['Stock'].value_counts() stocks=share_freq/unique_dates for stock,value in stocks.items():
if(value>0.8):
print(stock)
So after that i can see which one has>0.8 but i still need to select only one stock per firm. I dont know how to respect all the criterios and by the end filter all the dataframe by the criterios and save in .csv
We can use nunique
n=df.loc[df['Volume']>=500000,'Date'].groupby(df['Stock']).nunique()
uniquedate=df.Date.nunique()
n=n[n/uniquedate >0.8]
print(n.index)
Update
df.loc[df.Stock.isin(n.index)].to_excel('output.xlsx')