I'm trying to pull all the information in a table found on https://www.macrotrends.net/stocks/charts/GM/general-motors/income-statement?freq=Q. While playing on the google console I was able to use document.querySelectorAll("[role=row]")
to collect all the rows I wanted.
My problem is that I'm trying to collect all this information in Python. Using BeautifulSoup4, I was able to collect everything on the webpage but the information appears to be structured differently. Nothing comes up when I try to gather elements by the roles or classnames found online. The structure looks completely different after I download it with Python. Below is a snippet of code that contains the information I want (which doesn't look similar to what I found online):
var originalData = [{"field_name":"<a href='\/stocks\/charts\/GM\/general-motors\/revenue'>Revenue<\/a>","popup_icon":"<div class='ajax-chart' data-tipped-options=\"ajax: {data: { t: 'GM', s: 'revenue', freq: 'Q', statement: 'income-statement' }}\"><i style='font-size:18px; color:#337ab7;' class='fas fa-chart-bar'><\/i><\/span><\/div>","2020-03-31":"32709.00000","2019-12-31":"30826.00000","2019-09-30":"35473.00000","2019-06-30":"36060.00000","2019-03-31":"34878.00000","2018-12-31":"38399.00000","2018-09-30":"35791.00000","2018-06-30":"36760.00000","2018-03-31":"36099.00000","2017-12-31":"37715.00000","2017-09-30":"33623.00000","2017-06-30":"36984.00000","2017-03-31":"37266.00000","2016-12-31":"35647.00000","2016-09-30":"38889.00000","2016-06-30":"37383.00000","2016-03-31":"37265.00000","2015-12-31":"22990.00000","2015-09-30":"38843.00000","2015-06-30":"38180.00000","2015-03-31":"35712.00000","2014-12-31":"39617.00000","2014-09-30":"39255.00000","2014-06-30":"39649.00000","2014-03-31":"37408.00000","2013-12-31":"40485.00000","2013-09-30":"38983.00000","2013-06-30":"39075.00000","2013-03-31":"36884.00000","2012-12-31":"39307.00000","2012-09-30":"37576.00000","2012-06-30":"37614.00000","2012-03-31":"37759.00000","2011-12-31":"37990.00000","2011-09-30":"36719.00000","2011-06-30":"39373.00000","2011-03-31":"36194.00000","2010-12-31":"36900.00000","2010-09-30":"34060.00000","2010-06-30":"33174.00000","2010-03-31":"31476.00000","2009-12-31":"","2008-12-31":""},...
That's the information for the first row, "Revenue". Excuse my ignorance, but this almost looks like it's now a variable with everything organized in JSON format. Is there a way I can collect that variable and parse through it like any other JSON data? Or is there another preferred method for collecting this information? I've previously used free API's but I've found their data can sometimes be unreliable or they switch to a subscription model (like financial modeling prep). Any suggestions are appreciated!
You can use re
and json
modules to decode the data:
import re
import json
import requests
from bs4 import BeautifulSoup
import pandas as pd
url = 'https://www.macrotrends.net/stocks/charts/GM/general-motors/income-statement?freq=Q'
data = json.loads(re.search(r'var originalData = (\[(.*)\])', requests.get(url).text).group(1))
for d in data:
d['field_name'] = BeautifulSoup(d['field_name'], 'html.parser').text
del d['popup_icon']
df = pd.DataFrame(data)
field_name 2020-03-31 2019-12-31 2019-09-30 2019-06-30 2019-03-31 2018-12-31 ... 2011-03-31 2010-12-31 2010-09-30 2010-06-30 2010-03-31 2009-12-31 2008-12-31
0 Revenue 32709.00000 30826.00000 35473.00000 36060.00000 34878.00000 38399.00000 ... 36194.00000 36900.00000 34060.00000 33174.00000 31476.00000
1 Cost Of Goods Sold 30082.00000 29098.00000 31161.00000 31471.00000 31535.00000 35092.00000 ... 31850.00000 33171.00000 29587.00000 28609.00000 27553.00000
2 Gross Profit 2627.00000 1728.00000 4312.00000 4589.00000 3343.00000 3307.00000 ... 4344.00000 3729.00000 4473.00000 4565.00000 3923.00000
3 Research And Development Expenses ... 0.00000
4 SG&A Expenses 1970.00000 2282.00000 2008.00000 2102.00000 2099.00000 2478.00000 ... 2994.00000 3432.00000 2710.00000 2623.00000 2684.00000
5 Other Operating Income Or Expenses ... -6.00000 -3.00000 -30.00000 -39.00000 -46.00000
6 Operating Expenses 32052.00000 31380.00000 33169.00000 33573.00000 33634.00000 37570.00000 ... 35245.00000 36603.00000 32327.00000 31271.00000 30283.00000
7 Operating Income 657.00000 -554.00000 2304.00000 2487.00000 1244.00000 829.00000 ... 949.00000 297.00000 1733.00000 1903.00000 1193.00000
8 Total Non-Operating Income/Expense 118.00000 -655.00000 278.00000 440.00000 624.00000 -886.00000 ... 455.00000 747.00000 114.00000 -341.00000 109.00000
9 Pre-Tax Income 775.00000 -1209.00000 2582.00000 2927.00000 1868.00000 -57.00000 ... 1404.00000 1026.00000 1847.00000 1562.00000 1302.00000
10 Income Taxes 357.00000 -163.00000 271.00000 524.00000 137.00000 -611.00000 ... 137.00000 -173.00000 -25.00000 361.00000 509.00000
11 Income After Taxes 418.00000 -1046.00000 2311.00000 2403.00000 1731.00000 554.00000 ... 1267.00000 1199.00000 1872.00000 1201.00000 793.00000
12 Other Income ... 0.00000
13 Income From Continuous Operations 286.00000 -192.00000 2311.00000 2403.00000 2145.00000 2069.00000 ... 3411.00000 1406.00000 2223.00000 1612.00000 1196.00000
14 Income From Discontinued Operations ... 0.00000
15 Net Income 247.00000 -232.00000 2313.00000 2381.00000 2119.00000 1992.00000 ... 3151.00000 1406.00000 1959.00000 1334.00000 865.00000
16 EBITDA 3965.00000 2732.00000 5613.00000 5894.00000 5360.00000 4475.00000 ... 949.00000 297.00000 1733.00000 1903.00000 1193.00000
17 EBIT 657.00000 -554.00000 2304.00000 2487.00000 1244.00000 829.00000 ... 949.00000 297.00000 1733.00000 1903.00000 1193.00000
18 Basic Shares Outstanding 1433.00000 1424.00000 1428.00000 1420.00000 1417.00000 1411.00000 ... 1504.00000 1612.90300 1500.00000 1500.00000 1500.00000
19 Shares Outstanding 1440.00000 1439.00000 1442.00000 1438.00000 1436.00000 1431.00000 ... 1817.00000 1612.90300 1630.00000 1567.00000 1567.00000
20 Basic EPS 0.17000 -0.18000 1.62000 1.68000 1.50000 1.43000 ... 2.09000 0.31000 1.31000 0.89000 0.58000
21 EPS - Earnings Per Share 0.17000 -0.17000 1.60000 1.66000 1.48000 1.40000 ... 1.77000 0.31000 1.20000 0.85000 0.55000 0.00000 0.00000
[22 rows x 44 columns]