Search code examples
pythonpandasjupytermulti-index

Dividing rows within a multiindex


Good evening all who happen to find this post!

I'm having a slight issue when it comes to dividing a stocks netIncome by its totalRevenue within a multiindex.

The code is as follows:

import yahoo_fin.stock_info as si
import pandas as pd

company = ['AMZN', 'FB']

inc = {}
for ticker in company:
    inc[ticker] = si.get_income_statement(ticker)
df = pd.concat(inc)

pM = df.loc[(slice(None), ['netIncome', 'totalRevenue']), :]
pM

Now, all of this code should return me the company ticker, along with the net incomes and total revenue of the previous 4 years. Which is exactly what I need, and is all good so far...

I then want to take each company (ticker) in the index and divide the netIncome by totalRevenue (of the same company) and times that by 100 to get me a profit margin percentage...

If I use the code below it returns a load of NaNs within the rows...

test = pM.loc[(slice(None), 'netIncome'), :] / pM.loc[(slice(None), 'totalRevenue'), :] * 100
test

But if I manually type in the companies ticker like so:

pM.loc[('AMZN', 'netIncome'), :] / pM.loc[('AMZN', 'totalRevenue'), :] * 100

    endDate
2019-12-31    4.13087
2018-12-31    4.32527
2017-12-31    1.70522
2016-12-31    1.74355
dtype: object

It returns what I'm looking for, with that company...but obviously if I'm chopping and changing or have multiple stock tickers within the 'company' list then you can see it would be a bit long winded...

  • Do I need to create a new row within the multiindex?
  • Am I using the slice function in the wrong way?
  • should I be using df.div?

These are all questions I've asked myself and cant find an answer, so my final solution is here on this forum!

Didn't really want to bother anyone with my questions, but I don't know what else to do :(

Thank you in advance, and hopefully everything I've written is clear, any issues let me know.


Solution

  • The reason it failed was because the indices of the two operands were not aligned:

    >>> pM.loc[(slice(None), 'netIncome'), :]
    endDate          2019-12-31   2018-12-31   2017-12-31   2016-12-31
         Breakdown                                                    
    AMZN netIncome  11588000000  10073000000   3033000000   2371000000
    FB   netIncome  18485000000  22112000000  15934000000  10217000000
    
    >>> pM.loc[(slice(None), 'totalRevenue'), :]
    endDate              2019-12-31    2018-12-31    2017-12-31    2016-12-31
         Breakdown                                                           
    AMZN totalRevenue  280522000000  232887000000  177866000000  135987000000
    FB   totalRevenue   70697000000   55838000000   40653000000   27638000000
    

    pandas was looking for the label (AMZN, netIncome) in the second frame and there isn't one so the result of the revision is N/A. Same with the other rows.


    What you wanted was xs:

    margin = (
        pM.xs((slice(None), 'netIncome'))
            .div(pM.xs((slice(None), 'totalRevenue')))
            .mul(100)
            .assign(Breakdown='profitMargin')
            .set_index('Breakdown', append=True)
    )
    
    pM = pM.append(margin).sort_index()
    

    You can see how much work it takes just to add a metric to the data frame. That was because your pM frame was inefficiently designed. Try this instead:

    company = ['AMZN', 'FB']
    
    inc = {}
    for ticker in company:
        inc[ticker] = si.get_income_statement(ticker).T # ← added a `.T` here
    df = pd.concat(inc)
    
    df['profitMargin'] = df['netIncome'] / df['totalRevenue'] * 100