Search code examples
pythonpython-3.xpandasdataframequantitative-finance

How can I map a stock's specific sector over a period of time?


I am having trouble mapping stock specific sectors over a period of time. To give you an example. Here is the sector data of a stock.

ticker     sector_code   entry_date   exit_date
abc        sec1          20080501     20110504
abc        sec2          20110505     20120403
abc        sec3          20120404     NA

Time period is 01-01-2008 to 01-01-2020. I would like to create a table like this:

             abc
2008-01-01   no_sector
...
2008-05-01   sec1
2008-05-02   sec1
...
2011-11-11   sec2
...

My basic instinct is to use loops and if statements. But I found out that it would be so complicated and computationally expensive. I could not figure out any other ways to do this. Would you please give me a hand? Thank you very much.


Solution

  • Use:

    #convert values to datetimes and replace missing values by another column
    df['entry_date'] = pd.to_datetime(df['entry_date'], format='%Y%m%d')
    df['exit_date'] = pd.to_datetime(df['exit_date'], format='%Y%m%d').fillna(df['entry_date'])
    
    print (df)
      ticker sector_code entry_date  exit_date
    0    abc        sec1 2008-05-01 2011-05-04
    1    abc        sec2 2011-05-05 2012-04-03
    2    abc        sec3 2012-04-04 2012-04-04
    
    #for each row create date range and concat together
    s = pd.concat([pd.Series(r.Index,pd.date_range(r.entry_date, r.exit_date)) 
                              for r in df.itertuples()])
    
    #create new DataFrame and join original data with filtered columns by list
    df = (pd.DataFrame(s.index, index=s, columns=['date'])
            .join(df[['ticker','sector_code']])
            .reset_index(drop=True))
    print (df)
               date ticker sector_code
    0    2008-05-01    abc        sec1
    1    2008-05-02    abc        sec1
    2    2008-05-03    abc        sec1
    3    2008-05-04    abc        sec1
    4    2008-05-05    abc        sec1
            ...    ...         ...
    1430 2012-03-31    abc        sec2
    1431 2012-04-01    abc        sec2
    1432 2012-04-02    abc        sec2
    1433 2012-04-03    abc        sec2
    1434 2012-04-04    abc        sec3
    
    [1435 rows x 3 columns]