TL;DR I need to turn a BS4 resultset list (single column) into an NxN array, but how? And how can I get headers attached that are also BS4 resultset list? Code below. Thank-you!
So I am attempting to web scrape sports data, but I'm having trouble converting the resultset into an NxN array. Additionally, I'm trying to include headers that were scraped in the same manner. Here's my code so far:
import requests
from bs4 import BeautifulSoup
from __future__ import print_function
import numpy as np
url=input("Paste player link and specific year ")
r= requests.get(url)
html_content=r.text
soup=BeautifulSoup(html_content,"lxml")
body = soup.body
table=body.table
tbody=table.tbody
headers = table.find_all("th")
statistics = tbody.find_all("td")
def string_stats():
for stat in statistics:
print (stat.string)
def string_headers():
for head in headers:
print (head.string)
string_stats_list = string_stats()
string_stats_list
This results in a vertical list of just the td tag elements as strings (or that was the goal).
So, my questions are: How can I get this single column list into an NxN array/matrix? Additionally, how can I get the headers attached?
Thanks for reading and/or the help!
import pandas as pd
import requests
from bs4 import BeautifulSoup
url='http://www.footballdb.com/players/mike-evans-evansmi03/gamelogs'
r= requests.get(url)
html_content=r.content
soup=BeautifulSoup(html_content,"lxml")
body = soup.body
table=body.table
headers = table.find_all("th")
headers_list = [i.text for i in headers]
string_stats_list = []
row = []
for i in table.select('tr')[1:]:
for j in i.select('td'):
row.append(j.text)
string_stats_list.append(row)
row = []
df = pd.DataFrame(data=string_stats_list, columns=headers_list)