The first table isn't coming through during scrape for election website:
url = https://electproject.github.io/Early-Vote-2020G/GA_RO.html
Here is code:
import requests
from bs4 import BeautifulSoup
import pandas as pd
headers = {
"accept": "application/json, text/javascript, */*; q=0.01",
"accept-encoding": "gzip, deflate, br",
"accept-language": "en-GB,en-US;q=0.9,en;q=0.8",
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.99 Safari/537.36",
"x-requested-with": "XMLHttpRequest",
}
url = r'https://electproject.github.io/Early-Vote-2020G/GA_RO.html'
r = requests.get(url, headers=headers).text
soup = BeautifulSoup(r, 'html.parser')
Tried both of these, but still didn't get first table with counties/votes/turnout rates
tables = soup.findAll('table')
dfs = list()
for table in tables:
df = pd.read_html(str(table))[0]
dfs.append(df)
Other attempt:
df = pd.read_html(r, flavor='html5lib')
Both pull all other tables but not the first. I assume it's due to headers with sort capabilities, but not sure.
The problem is that the first table is rendered with JavaScript, there is no <table>
in the HTML for that table.
What you can do is get the data from the JavaScript directly (page source can be inspected to find the right <script>
element):
import json
data = soup.findAll('script', {
'data-for': 'htmlwidget-21712dd45dd736e3c1b9',
})[0].contents[0]
df = pd.DataFrame(json.loads(data)['x']['data']).T
Output:
0 1 2 3
0 APPLING 4453 12240 0.363807
1 ATKINSON 1431 4939 0.289735
2 BACON 3111 7071 0.439966
3 BAKER 872 2297 0.379626
4 BALDWIN 11850 27567 0.429862
.. ... ... ... ...
154 WHITFIELD 16980 57014 0.297822
155 WILCOX 1582 4838 0.326995
156 WILKES 2958 7204 0.410605
157 WILKINSON 2236 6761 0.33072
158 WORTH 4473 14601 0.306349