I have a dataframe containing historical stock data looks like this:
stock_name | price | time_stamp |
---|---|---|
stock_a | 35.6 | 2024-01-01 |
stock_a | 34.7 | 2024-01-02 |
stock_b | 56.7 | 2024-01-01 |
stock_b | 55 | 2024-01-02 |
I have multiple stocks in database up to 500, and I want to calculate the correlation between stock_a and all other stocks (ex. stock_b, stock c, stock d...) with their price value across the time interval.
I expect getting the result like:
Stock_A correlation
stock_name | correlation |
---|---|
stock_b | 0.87 |
stock_c | 0.76 |
Now I can only calculate the correlation while specifying each stock other than stock a one by one (PostgreSQL as example):
select stock_b, corr(d1.close_price, d2.close_price)
from stock_dmt d1
inner join stock_dmt d2
on d1.time_stamp = d2.time_stamp
where d1.stock_code = 'stock_a' and d2.stock_code = 'stock_b';
However it's impossible to do that manually for over 500 stocks.
Is there any efficient way to complete this task? Either with SQL or Python script is welcome.
Thank you
I am trying to do with python. I hope this way will meets your question.
import pandas as pd
# Creating a sample data
data = {
"stock_name": ["stock_a", "stock_a", "stock_b", "stock_b"],
"price": [35.6, 34.7, 56.7,55],
"time_stamp": ["2024-01-01", "2024-01-02", "2024-01-01", "2024-01-02"]
}
# Create the DataFrame from the dictionary
df = pd.DataFrame(data)
# Convert to the pivot table
df_pivot = df.pivot(index = 'time_stamp',columns=["stock_name"], values="price").reset_index()
# If you can drop the column timestamp
df_pivot.drop(['time_stamp'], axis = 1).corr()