I have been using the following script to scrape some data from a website and export to .csv file:
import requests
from bs4 import BeautifulSoup
import pandas as pd
res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/tournament-LCS%20Summer%202020/')
soup = BeautifulSoup(res.text, 'html.parser')
table = soup.find("table", class_="table_list playerslist tablesaw trhover")
columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]
data = []
table.find("thead").extract()
for tr in table.find_all("tr"):
data.append([td.get_text(strip=True) for td in tr.find_all("td")])
df = pd.DataFrame(data, columns=columns)
df.to_csv("S10-NA-AVGs.csv", index=False)
I am having issues with trying this same script trying to collect other data and export to .csv. The website in question is: https://gol.gg/game/stats/25989/page-fullstats/
I understand that the data is laid out differently in the html code and that is where I am a little mixed up in what it is looking for to grab. It seems to be a where the individual fields are stored so I tried to change this line around:
columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]
That is where I am receiving the error message:
AttributeError: 'NoneType' object has no attribute 'find'
I tried changing to "th" and "thead" to a few different variations but was unsuccessful.
How about using pandas
to get all the job done, since you already use it?
import requests
import pandas as pd
res = requests.get('https://gol.gg/game/stats/25989/page-fullstats/')
df = pd.read_html(res.text, skiprows=[0])
df = pd.concat(df)
df.to_csv("data.csv", index=False)
print(df)
Output:
[ Player Huni Svenskeren ... Ryoma Cody Sun Poome
0 Role TOP JUNGLE ... MID ADC SUPPORT
1 Kills 2 0 ... 5 4 2
2 Deaths 5 6 ... 2 2 1
3 Assists 3 5 ... 10 12 16
4 KDA 1 0.8 ... 7.5 8 18
5 CS 186 136 ... 210 217 27
6 CS in Team's Jungle 4 80 ... 8 8 0
7 CS in Enemy Jungle 0 0 ... 0 6 0
8 CSM 7.6 5.5 ... 8.6 8.8 1.1
9 Golds 8723 7059 ... 11074 11275 7255
10 GPM 355 288 ... 451 459 296
11 GOLD% 21.9% 17.7% ... 20.5% 20.8% 13.4%
12 Vision Score 14 24 ... 27 37 52
13 Wards placed 7 7 ... 9 9 34
14 Wards destroyed 4 3 ... 3 10 5
15 Control Wards Purchased 0 6 ... 7 2 10
16 VSPM 0.57 0.98 ... 1.1 1.51 2.12
17 WPM 0.29 0.29 ... 0.37 0.37 1.38
18 VWPM 0 0.24 ... 0.29 0.08 0.41
19 WCPM 0.16 0.12 ... 0.12 0.41 0.2
20 VS% 9% 15.4% ... 15.6% 21.4% 30.1%
21 Total damage to Champion 11637 11069 ... 9516 12053 3669
22 Physical Damage 6533 9367 ... 166 11214 604
23 Magic Damage 5104 395 ... 9340 755 3065
24 True Damage 0 1307 ... 10 84 0
25 DPM 474 451 ... 388 491 149
26 DMG% 24.1% 22.9% ... 17.4% 22% 6.7%
27 K+A Per Minute 0.2 0.2 ... 0.61 0.65 0.73
28 KP% 83.3% 83.3% ... 65.2% 69.6% 78.3%
29 Solo kills NaN NaN ... NaN NaN NaN
30 Double kills 0 0 ... 1 2 0
31 Triple kills 0 0 ... 0 0 0
32 Quadra kills 0 0 ... 0 0 0
33 Penta kills 0 0 ... 0 0 0
34 GD@15 -2492 -1117 ... -21 -1272 -292
35 CSD@15 -9 -27 ... -29 -1 -6
36 XPD@15 -1149 -1627 ... -191 -287 -1322
37 LVLD@15 -1 -1 ... 0 0 -1
38 Damage dealt to turrets 0 883 ... 1557 4582 717
39 Total heal 1010 5737 ... 2600 2343 3120
40 Damage self mitigated 16638 10704 ... 16506 5476 11927
41 Time ccing others 26 16 ... 18 26 11
42 Total damage taken 18869 19320 ... 14264 11844 9137
This gets you a nice .csv
file:
Bonus: the code also works with the other URL:
import requests
import pandas as pd
res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/tournament-LCS%20Summer%202020/')
df = pd.read_html(res.text, skiprows=[0])
df = pd.concat(df)
print(df)
Prints:
100 Thieves S10 NA 18 38.9% ... 33.3 1976 3.0 1.23 1.35
0 CLG S10 NaN 19 26.3% ... 32.6 1790 3.3 1.21 1.30
1 Cloud9 S10 NaN 18 72.2% ... 33.4 1971 3.0 1.12 1.30
2 Dignitas S10 NaN 19 31.6% ... 32.7 1590 3.1 1.27 1.33
3 Evil Geniuses S10 NaN 18 44.4% ... 32.2 1920 3.3 1.39 1.41
4 FlyQuest S10 NaN 18 66.7% ... 32.8 1856 3.3 1.21 1.77
5 Golden Guardians S10 NaN 18 50.0% ... 33.8 1992 3.4 1.26 1.53
6 Immortals S10 NaN 18 22.2% ... 31.1 1717 3.3 1.35 1.46
7 Team Liquid S10 NaN 18 83.3% ... 33.6 1784 3.4 1.24 1.51
8 TSM S10 NaN 18 66.7% ... 32.5 1741 3.2 1.33 1.33