Search code examples
pythonbeautifulsoupfinance

Scraping div class information using BeautifulSoup4


I'm trying to pull all the information in a table found on https://www.macrotrends.net/stocks/charts/GM/general-motors/income-statement?freq=Q. While playing on the google console I was able to use document.querySelectorAll("[role=row]") to collect all the rows I wanted.

My problem is that I'm trying to collect all this information in Python. Using BeautifulSoup4, I was able to collect everything on the webpage but the information appears to be structured differently. Nothing comes up when I try to gather elements by the roles or classnames found online. The structure looks completely different after I download it with Python. Below is a snippet of code that contains the information I want (which doesn't look similar to what I found online):

var originalData = [{"field_name":"<a href='\/stocks\/charts\/GM\/general-motors\/revenue'>Revenue<\/a>","popup_icon":"<div class='ajax-chart' data-tipped-options=\"ajax: {data: { t: 'GM', s: 'revenue', freq: 'Q', statement: 'income-statement' }}\"><i style='font-size:18px; color:#337ab7;' class='fas fa-chart-bar'><\/i><\/span><\/div>","2020-03-31":"32709.00000","2019-12-31":"30826.00000","2019-09-30":"35473.00000","2019-06-30":"36060.00000","2019-03-31":"34878.00000","2018-12-31":"38399.00000","2018-09-30":"35791.00000","2018-06-30":"36760.00000","2018-03-31":"36099.00000","2017-12-31":"37715.00000","2017-09-30":"33623.00000","2017-06-30":"36984.00000","2017-03-31":"37266.00000","2016-12-31":"35647.00000","2016-09-30":"38889.00000","2016-06-30":"37383.00000","2016-03-31":"37265.00000","2015-12-31":"22990.00000","2015-09-30":"38843.00000","2015-06-30":"38180.00000","2015-03-31":"35712.00000","2014-12-31":"39617.00000","2014-09-30":"39255.00000","2014-06-30":"39649.00000","2014-03-31":"37408.00000","2013-12-31":"40485.00000","2013-09-30":"38983.00000","2013-06-30":"39075.00000","2013-03-31":"36884.00000","2012-12-31":"39307.00000","2012-09-30":"37576.00000","2012-06-30":"37614.00000","2012-03-31":"37759.00000","2011-12-31":"37990.00000","2011-09-30":"36719.00000","2011-06-30":"39373.00000","2011-03-31":"36194.00000","2010-12-31":"36900.00000","2010-09-30":"34060.00000","2010-06-30":"33174.00000","2010-03-31":"31476.00000","2009-12-31":"","2008-12-31":""},...

That's the information for the first row, "Revenue". Excuse my ignorance, but this almost looks like it's now a variable with everything organized in JSON format. Is there a way I can collect that variable and parse through it like any other JSON data? Or is there another preferred method for collecting this information? I've previously used free API's but I've found their data can sometimes be unreliable or they switch to a subscription model (like financial modeling prep). Any suggestions are appreciated!


Solution

  • You can use re and json modules to decode the data:

    import re
    import json
    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    
    url = 'https://www.macrotrends.net/stocks/charts/GM/general-motors/income-statement?freq=Q'
    
    data = json.loads(re.search(r'var originalData = (\[(.*)\])', requests.get(url).text).group(1))
    
    for d in data:
        d['field_name'] = BeautifulSoup(d['field_name'], 'html.parser').text
        del d['popup_icon']
    
    df = pd.DataFrame(data)
    
    print(df)
    

    Prints:

                                 field_name   2020-03-31   2019-12-31   2019-09-30   2019-06-30   2019-03-31   2018-12-31  ...   2011-03-31   2010-12-31   2010-09-30   2010-06-30   2010-03-31 2009-12-31 2008-12-31
    0                               Revenue  32709.00000  30826.00000  35473.00000  36060.00000  34878.00000  38399.00000  ...  36194.00000  36900.00000  34060.00000  33174.00000  31476.00000                      
    1                    Cost Of Goods Sold  30082.00000  29098.00000  31161.00000  31471.00000  31535.00000  35092.00000  ...  31850.00000  33171.00000  29587.00000  28609.00000  27553.00000                      
    2                          Gross Profit   2627.00000   1728.00000   4312.00000   4589.00000   3343.00000   3307.00000  ...   4344.00000   3729.00000   4473.00000   4565.00000   3923.00000                      
    3     Research And Development Expenses                                                                                ...                   0.00000                                                             
    4                         SG&A Expenses   1970.00000   2282.00000   2008.00000   2102.00000   2099.00000   2478.00000  ...   2994.00000   3432.00000   2710.00000   2623.00000   2684.00000                      
    5    Other Operating Income Or Expenses                                                                                ...     -6.00000     -3.00000    -30.00000    -39.00000    -46.00000                      
    6                    Operating Expenses  32052.00000  31380.00000  33169.00000  33573.00000  33634.00000  37570.00000  ...  35245.00000  36603.00000  32327.00000  31271.00000  30283.00000                      
    7                      Operating Income    657.00000   -554.00000   2304.00000   2487.00000   1244.00000    829.00000  ...    949.00000    297.00000   1733.00000   1903.00000   1193.00000                      
    8    Total Non-Operating Income/Expense    118.00000   -655.00000    278.00000    440.00000    624.00000   -886.00000  ...    455.00000    747.00000    114.00000   -341.00000    109.00000                      
    9                        Pre-Tax Income    775.00000  -1209.00000   2582.00000   2927.00000   1868.00000    -57.00000  ...   1404.00000   1026.00000   1847.00000   1562.00000   1302.00000                      
    10                         Income Taxes    357.00000   -163.00000    271.00000    524.00000    137.00000   -611.00000  ...    137.00000   -173.00000    -25.00000    361.00000    509.00000                      
    11                   Income After Taxes    418.00000  -1046.00000   2311.00000   2403.00000   1731.00000    554.00000  ...   1267.00000   1199.00000   1872.00000   1201.00000    793.00000                      
    12                         Other Income                                                                                ...                   0.00000                                                             
    13    Income From Continuous Operations    286.00000   -192.00000   2311.00000   2403.00000   2145.00000   2069.00000  ...   3411.00000   1406.00000   2223.00000   1612.00000   1196.00000                      
    14  Income From Discontinued Operations                                                                                ...                   0.00000                                                             
    15                           Net Income    247.00000   -232.00000   2313.00000   2381.00000   2119.00000   1992.00000  ...   3151.00000   1406.00000   1959.00000   1334.00000    865.00000                      
    16                               EBITDA   3965.00000   2732.00000   5613.00000   5894.00000   5360.00000   4475.00000  ...    949.00000    297.00000   1733.00000   1903.00000   1193.00000                      
    17                                 EBIT    657.00000   -554.00000   2304.00000   2487.00000   1244.00000    829.00000  ...    949.00000    297.00000   1733.00000   1903.00000   1193.00000                      
    18             Basic Shares Outstanding   1433.00000   1424.00000   1428.00000   1420.00000   1417.00000   1411.00000  ...   1504.00000   1612.90300   1500.00000   1500.00000   1500.00000                      
    19                   Shares Outstanding   1440.00000   1439.00000   1442.00000   1438.00000   1436.00000   1431.00000  ...   1817.00000   1612.90300   1630.00000   1567.00000   1567.00000                      
    20                            Basic EPS      0.17000     -0.18000      1.62000      1.68000      1.50000      1.43000  ...      2.09000      0.31000      1.31000      0.89000      0.58000                      
    21             EPS - Earnings Per Share      0.17000     -0.17000      1.60000      1.66000      1.48000      1.40000  ...      1.77000      0.31000      1.20000      0.85000      0.55000    0.00000    0.00000
    
    [22 rows x 44 columns]