I am using Python to scrape the names of the U.S. Congress from Ballotpedia (https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress). My current code is giving me all four columns in each of the two tables (Senate and House). Here is my current code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
list = ['https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress']
all_tables = pd.read_html(list[0])
senators = all_tables[3]
house_members = all_tables[6]
congress = senators.append(house_members)
congress.to_csv('3-New Congressmen.csv')
Obviously, I've been trying to work with lines 7-10 but haven't fond a way to get only the names of the legislators. I'm only interested in the name column.
Is my mistake in ignoring the inspect function of the Ballotpedia page? Or is an extra line of code needed to specify which column I want? Thank you very much for your help!
To get only names of legislators, you can do:
import pandas as pd
url = "https://ballotpedia.org/List_of_current_members_of_the_U.S._Congress"
dfs = pd.read_html(url)
legislators_df = dfs[3]["Name"]
house_members = dfs[6]["Name"]
pd.concat([legislators_df, house_members]).to_csv("out.csv", index=False)
Creates out.csv
:
0 Richard Shelby
1 Tommy Tuberville
2 Lisa Murkowski
3 Daniel S. Sullivan
4 Mark Kelly
5 Kyrsten Sinema
6 John Boozman
7 Tom Cotton
8 Dianne Feinstein
9 Alex Padilla
10 Michael Bennet
...