How can I combine the full lists into a dataframe. When I print it seems to only print the first record and it also includes \n and other redundancies like ' etc.
import requests
from requests_html import HTML, HTMLSession
from bs4 import BeautifulSoup
import pandas as pd
import csv
import json
url = 'https://lehighsports.com/sports/mens-soccer/schedule/2018'
lehigh = requests.get(url).text
soup = BeautifulSoup(lehigh,'lxml')
for opp in soup.find_all('div',class_="sidearm-schedule-game-opponent-text"):
opp_list = []
opp_list.append(opp.text)
# print(opp_list)
for conf in soup.find_all('div',class_="sidearm-schedule-game-conference-conference"):
conf_list = []
conf_list.append(conf.text)
# print(conf_list)
dict = {'opponent':[opp_list],'conference':[conf_list]}
df = pd.DataFrame(dict)
print(df)
You are setting opp_list
and conf_list
in every iteration to []
- initialize them only once. Alson, you don't have to put brackets in dictionary creation {'opponent':opp_list,'conference':conf_list}
To remove whitespace, you can use .get_text()
method with strip=True
and separator=
parameters.
For example:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://lehighsports.com/sports/mens-soccer/schedule/2018'
lehigh = requests.get(url).text
soup = BeautifulSoup(lehigh,'lxml')
opp_list = []
for opp in soup.find_all('div',class_="sidearm-schedule-game-opponent-text"):
opp_list.append(opp.get_text(strip=True, separator=' '))
conf_list = []
for conf in soup.find_all('div',class_="sidearm-schedule-game-conference-conference"):
conf_list.append(conf.get_text(strip=True))
dict = {'opponent':opp_list,'conference':conf_list}
df = pd.DataFrame(dict)
print(df)
Prints:
opponent conference
0 at UConn
1 vs Drexel
2 at George Washington
3 at St. John's
4 vs Binghamton
5 at Rider
6 vs Penn
7 at Army Patriot League*
8 vs Cornell
9 at Boston U Patriot League*
10 vs #20 Colgate Patriot League*
11 vs Navy Patriot League*
12 at Lafayette Patriot League*
13 at Dartmouth
14 vs American Patriot League*
15 at Bucknell Patriot League*
16 at Loyola (Md.) Patriot League*
17 vs Holy Cross Senior Night Patriot League*
18 vs No. 3 Colgate (Semifinals)