Search code examples
pythonbeautifulsoupwiki

Scraping through on Wiki using "tr" and "td" with BeautifulSoup and python


Total python3 beginner here. I can't seem to get just the name of of the colleges to print out. the class is no where near the college names and i can't seem to narrow the find_all down to what i need. and print to a new csv file. Any ideas?

import requests
from bs4 import BeautifulSoup
import csv


res= requests.get("https://en.wikipedia.org/wiki/Ivy_League")
soup = BeautifulSoup(res.text, "html.parser")
colleges = soup.find_all("table", class_ = "wikitable sortable")

for college in colleges:
    first_level = college.find_all("tr")
    print(first_level)

Solution

  • You can use soup.select() to utilize css selectors and be more precise:

    import requests
    from bs4 import BeautifulSoup
    
    res= requests.get("https://en.wikipedia.org/wiki/Ivy_League")
    soup = BeautifulSoup(res.text, "html.parser")
    
    l = soup.select(".mw-parser-output > table:nth-of-type(2) > tbody > tr > td:nth-of-type(1) a")
    for each in l:
        print(each.text)
    

    Printed result:

    Brown University
    Columbia University
    Cornell University
    Dartmouth College
    Harvard University
    University of Pennsylvania
    Princeton University
    Yale University
    

    To put a single column into csv:

    import pandas as pd
    pd.DataFrame([e.text for e in l]).to_csv("your_csv.csv") # This will include index