Search code examples
pandasyfinance

Pandas convert groupby parameter to separate rows


I have some stock data in a pandas dataframe which is grouped by ticker. I want to amend the table format so that a row exists for each date and each ticker.

import yfinance as yf

def get_all_data(tickers, start_date="2009-01-01", end_date="2020-09-25"):
    df_orig = yf.download(tickers, start=start_date, end=end_date, group_by='ticker')
    df_orig = df_orig.fillna(method='ffill')
    return df_orig


sec_list = ['ULVR.L', 'MSFT', 'ABF.L']
df_orig = get_all_data(sec_list, start_date="2020-09-21", end_date="2020-09-25")
display(df_orig)

I currently get this enter image description here, but I want the data in the format below (i.e. 7 columns/12rows, rather than 15columns/4rows)

How should I do this?


| Date       | Ticker | Open   |
| ---------- | ------ | ------ |
| 2020-09-21 | MSFT   | 197.19 |
| 2020-09-21 | ABF.L  | 1903.5 |
| 2020-09-21 | ULVR.L | 4706   |
| 2020-09-22 | MSFT   | 205.06 |
| 2020-09-22 | ABF.L  | 1855   |
| 2020-09-22 | ULVR.L | 4671   |
| 2020-09-23 | MSFT   | 207.9  |
| 2020-09-23 | ABF.L  | 1870.5 |
| 2020-09-23 | ULVR.L | 4766   |
| 2020-09-24 | MSFT   | 199.85 |
| 2020-09-24 | ABF.L  | 1847   |
| 2020-09-24 | ULVR.L | 4743   |


Solution

  • You can use stack for that, naming the column/index maxes it easier as well:

    In [26]: df
    Out[26]:
                  a          b
               open close open close
    2020-10-10    1     2    3     4
    2020-10-10    5     6    7     8
    2020-10-10    1     2    3     4
    2020-10-10    6     7    8     9
    
    In [27]: df.columns.names = ["ticker", "metric"]
    
    In [28]: df.index.name = "date"
    
    In [29]: df.stack("ticker")
    Out[29]:
    metric             close  open
    date       ticker
    2020-10-10 a           2     1
               b           4     3
               a           6     5
               b           8     7
               a           2     1
               b           4     3
               a           7     6
               b           9     8
    

    Or if you don't care about naming things just use stack with an int:

    In [46]: df
    Out[46]:
                  a          b
               open close open close
    2020-10-10    1     2    3     4
    2020-10-10    5     6    7     8
    2020-10-10    1     2    3     4
    2020-10-10    6     7    8     9
    
    In [47]: df.stack(0)
    Out[47]:
                  close  open
    2020-10-10 a      2     1
               b      4     3
               a      6     5
               b      8     7
               a      2     1
               b      4     3
               a      7     6
               b      9     8
    
    # to set index names:
    
    In [56]: gf = df.stack(0)
    
    In [57]: gf.index = gf.index.set_names(["date", "ticker"])
    
    In [58]: gf
    Out[58]:
                       close  open
    date       ticker
    2020-10-10 a           2     1
               b           4     3
               a           6     5
               b           8     7
               a           2     1
               b           4     3
               a           7     6
               b           9     8