Search code examples
python-3.xweb-scrapingbeautifulsoupurllib

How can I extract a table from wikipedia using Beautiful soup


I am trying to write a scraper that extracts a table from this wikipedia page.The problem is, I can extract all tables on the page EXCEPT the one I actually need (which is the table containing the stats of all the election that has ever been conducted in the United States). I do not think the problem is with my tag.
Here is my code

from urllib.error import HTTPError
from urllib.error import URLError
from bs4 import BeautifulSoup
from urllib.request import urlopen

#getting the wiki page
page_info=urlopen('https://en.wikipedia.org/wiki/United_States_presidential_election')

soup=BeautifulSoup(page_info, 'html.parser')

headline=soup.find('table', "wikitable sortable jquery-tablesorter")
print(headline)

I think there is something crucial I am missing, but I can not wrap my head around it. Can someone help me please.


Solution

  • One way of doing this would be:

    import pandas as pd
    import requests
    from bs4 import BeautifulSoup
    
    
    page = requests.get('https://en.wikipedia.org/wiki/United_States_presidential_election').text
    soup = BeautifulSoup(page, 'html.parser')
    table = soup.find('table', class_="wikitable sortable")
    
    df = pd.read_html(str(table))
    df = pd.concat(df)
    print(df)
    df.to_csv("elections.csv", index=False)
    
    

    Which outputs:

         Year                                    Party  ... Electoral votes      Notes
    0    1788                              Independent  ...        69 / 138        NaN
    1    1788                               Federalist  ...        34 / 138        NaN
    2    1788                               Federalist  ...         9 / 138        NaN
    3    1788                               Federalist  ...         6 / 138        NaN
    4    1788                               Federalist  ...         6 / 138        NaN
    ..    ...                                      ...  ...             ...        ...
    [219 rows x 8 columns]
    

    Or a .csv file that looks like this:

    enter image description here

    Note: Whenever you're scraping, always turn JS (JavaScript) off. BeautifulSoup doesn't see dynamically rendered content. That's way you're not getting anything back, because without JS the class of the tag you're after is different.