I get the most recent list of traded companies from NASDAQ with this code however I'd like to have the results shown in a data-frame instead of just the list with all the other info I might not need.
Any ideas how could that be achieved? Thanks
from bs4 import BeautifulSoup
import requests
r=requests.get('https://www.nasdaq.com/screening/companies-by
industry.aspx
exchange=NASDAQ&sortname=marketcap&sorttype=1&pagesize=4000')
data = r.text
soup = BeautifulSoup(data, "html.parser")
table = soup.find( "table", {"id":"CompanylistResults"} )
for row in table.findAll("tr"):
for cell in row("td"):
print (cell.get_text().strip())
Looks like you are looking for the aptly named read_html, though you need to play around until you get what you want. In your case:
>>> import pandas as pd
>>> df=pd.read_html(table.prettify(),flavor='bs4')[0]
>>> df.columns = [c.strip() for c in df.columns]
See output below.
The first line is what gets the job done, and the second one just strips off all those pesky spaces and new lines in your header. Looks like there is a hidden ADR TSO
which seems useless, so you could drop it if you do not know what it is. It may also make sense to drop all even rows as they are just a continuation of the odd rows, and useless links as far as I can tell. In a single line:
>>> df = df.drop(['ADR TSO'], axis=1) #Drop useless column
>>> df1= df[::2] #To get rid of even rows
>>> df2= df[~df['Name'].str.contains('Stock Quote')].head() #By string filtration if we are not sure about the odd/even thing
Output of original head just for show:
>>> df.head()
Name Symbol Market Cap \
0 Amazon.com, Inc. AMZN $802.18B
1 AMZN Stock Quote AMZN Ratings AMZN Stock Report NaN NaN
2 Microsoft Corporation MSFT $789.12B
3 MSFT Stock Quote MSFT Ratings MSFT Stock Report NaN NaN
4 Alphabet Inc. GOOGL $740.3B
ADR TSO Country IPO Year \
0 NaN United States 1997
1 NaN NaN NaN
2 NaN United States 1986
3 NaN NaN NaN
4 NaN United States n/a
Subsector
0 Catalog/Specialty Distribution
1 NaN
2 Computer Software: Prepackaged Software
3 NaN
4 Computer Software: Programming, Data Processing
Output of cleaned df.head()
:
Name Symbol Market Cap Country IPO Year \
0 Amazon.com, Inc. AMZN $802.18B United States 1997
2 Microsoft Corporation MSFT $789.12B United States 1986
4 Alphabet Inc. GOOGL $740.3B United States n/a
6 Alphabet Inc. GOOG $735.24B United States 2004
8 Apple Inc. AAPL $720.3B United States 1980
Subsector
0 Catalog/Specialty Distribution
2 Computer Software: Prepackaged Software
4 Computer Software: Programming, Data Processing
6 Computer Software: Programming, Data Processing
8 Computer Manufacturing