I'm trying to scrape from multiple Ballotpedia pages with Python and put this info into a csv, but am only getting the results for the last element of the list. Here is my code:
import pandas as pd
list = ['https://ballotpedia.org/Alaska_Supreme_Court',
'https://ballotpedia.org/Utah_Supreme_Court']
for page in list:
frame = pd.read_html(page,attrs={"class":"wikitable
sortable jquery-tablesorter"})[0]
frame.drop("Appointed By", axis=1, inplace=True)
frame.to_csv("18-TEST.csv", index=False)
I've been playing around with adding and deleting parts of the last line of the code but the issue remains. The first element of the list must be getting added to the csv but them gets replaced by the second element. How can I get both to show up on the csv at the same time? Thank you very much!
Every iteration resets your frame
variable so it gets thrown away. You'll have to accumulate the entries all in one dataframe to save it all as one csv. Also, like piterbarg mentioned, list
is a reserved word in Python. It's not breaking your code but it is bad practice ;).
import pandas as pd
# better variable name "pages"
pages = ['https://ballotpedia.org/Alaska_Supreme_Court',
'https://ballotpedia.org/Utah_Supreme_Court']
# dataframe outside the loop to accumulate everything in
judges = pd.DataFrame()
for page in pages:
frame = pd.read_html(page, attrs={'class': 'wikitable sortable jquery-tablesorter'})[0]
frame.drop('Appointed By', axis=1, inplace=True)
# add this particular page's data to the main dataframe
judges = judges.append(frame, ignore_index=True)
# ignore_index ignores the indices from the frame we're adding,
# so the indices in the judges frame are continuous
# after the loop, save the complete dataframe to a csv
judges.to_csv('18-TEST.csv', index=False)
This will save it all in one csv. Give that a try!