Search code examples
pythonweb-scrapingmultiple-columnsweb-inspector

Python scrapes all columns of a table, but I only want to see one of the columns


I am using Python to scrape the names of the U.S. Congress from Ballotpedia (https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress). My current code is giving me all four columns in each of the two tables (Senate and House). Here is my current code:

import requests
from bs4 import BeautifulSoup
import pandas as pd

list = ['https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress']

all_tables = pd.read_html(list[0])
senators = all_tables[3]
house_members = all_tables[6]
congress = senators.append(house_members)

congress.to_csv('3-New Congressmen.csv')

Obviously, I've been trying to work with lines 7-10 but haven't fond a way to get only the names of the legislators. I'm only interested in the name column.

Is my mistake in ignoring the inspect function of the Ballotpedia page? Or is an extra line of code needed to specify which column I want? Thank you very much for your help!


Solution

  • To get only names of legislators, you can do:

    import pandas as pd
    
    url = "https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress"
    
    dfs = pd.read_html(url)
    
    legislators_df = dfs[3]["Name"]
    house_members = dfs[6]["Name"]
    
    
    pd.concat([legislators_df, house_members]).to_csv("out.csv", index=False)
    

    Creates out.csv:

    0             Richard Shelby
    1           Tommy Tuberville
    2             Lisa Murkowski
    3         Daniel S. Sullivan
    4                 Mark Kelly
    5             Kyrsten Sinema
    6               John Boozman
    7                 Tom Cotton
    8           Dianne Feinstein
    9               Alex Padilla
    10            Michael Bennet
    ...
    

    enter image description here