I am trying to write a scraper that extracts a table from this wikipedia page.The problem is, I can extract all tables on the page EXCEPT the one I actually need (which is the table containing the stats of all the election that has ever been conducted in the United States). I do not think the problem is with my tag.
Here is my code
from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup
from urllib.request import urlopen
#getting the wiki page
page_info=urlopen('https://en.wikipedia.org/wiki/United_States_presidential_election')
soup=BeautifulSoup(page_info, 'html.parser')
headline=soup.find('table', "wikitable sortable jquery-tablesorter")
print(headline)
I think there is something crucial I am missing, but I can not wrap my head around it. Can someone help me please.
One way of doing this would be:
import pandas as pd
import requests
from bs4 import BeautifulSoup
page = requests.get('https://en.wikipedia.org/wiki/United_States_presidential_election').text
soup = BeautifulSoup(page, 'html.parser')
table = soup.find('table', class_="wikitable sortable")
df = pd.read_html(str(table))
df = pd.concat(df)
print(df)
df.to_csv("elections.csv", index=False)
Which outputs:
Year Party ... Electoral votes Notes
0 1788 Independent ... 69 / 138 NaN
1 1788 Federalist ... 34 / 138 NaN
2 1788 Federalist ... 9 / 138 NaN
3 1788 Federalist ... 6 / 138 NaN
4 1788 Federalist ... 6 / 138 NaN
.. ... ... ... ... ...
[219 rows x 8 columns]
Or a .csv
file that looks like this:
Note: Whenever you're scraping, always turn JS
(JavaScript) off. BeautifulSoup
doesn't see dynamically rendered content. That's way you're not getting anything back, because without JS
the class of the tag you're after is different.