python csv web-scraping beautifulsoup export-to-csv

Web Scraping to .csv

I have been using the following script to scrape some data from a website and export to .csv file:

import requests
from bs4 import BeautifulSoup
import pandas as pd

res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/tournament-LCS%20Summer%202020/')

soup = BeautifulSoup(res.text, 'html.parser')

table = soup.find("table", class_="table_list playerslist tablesaw trhover")

columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]

data = []

table.find("thead").extract()

for tr in table.find_all("tr"):
    data.append([td.get_text(strip=True) for td in tr.find_all("td")])

df = pd.DataFrame(data, columns=columns)

df.to_csv("S10-NA-AVGs.csv", index=False)

I am having issues with trying this same script trying to collect other data and export to .csv. The website in question is: https://gol.gg/game/stats/25989/page-fullstats/

I understand that the data is laid out differently in the html code and that is where I am a little mixed up in what it is looking for to grab. It seems to be a where the individual fields are stored so I tried to change this line around:

columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]

That is where I am receiving the error message:

AttributeError: 'NoneType' object has no attribute 'find'

I tried changing to "th" and "thead" to a few different variations but was unsuccessful.

Solution

How about using pandas to get all the job done, since you already use it?

import requests
import pandas as pd

res = requests.get('https://gol.gg/game/stats/25989/page-fullstats/')

df = pd.read_html(res.text, skiprows=[0])
df = pd.concat(df)
df.to_csv("data.csv", index=False)
print(df)

Output:

[                      Player   Huni Svenskeren  ...  Ryoma Cody Sun    Poome
0                       Role    TOP     JUNGLE  ...    MID      ADC  SUPPORT
1                      Kills      2          0  ...      5        4        2
2                     Deaths      5          6  ...      2        2        1
3                    Assists      3          5  ...     10       12       16
4                        KDA      1        0.8  ...    7.5        8       18
5                         CS    186        136  ...    210      217       27
6        CS in Team's Jungle      4         80  ...      8        8        0
7         CS in Enemy Jungle      0          0  ...      0        6        0
8                        CSM    7.6        5.5  ...    8.6      8.8      1.1
9                      Golds   8723       7059  ...  11074    11275     7255
10                       GPM    355        288  ...    451      459      296
11                     GOLD%  21.9%      17.7%  ...  20.5%    20.8%    13.4%
12              Vision Score     14         24  ...     27       37       52
13              Wards placed      7          7  ...      9        9       34
14           Wards destroyed      4          3  ...      3       10        5
15   Control Wards Purchased      0          6  ...      7        2       10
16                      VSPM   0.57       0.98  ...    1.1     1.51     2.12
17                       WPM   0.29       0.29  ...   0.37     0.37     1.38
18                      VWPM      0       0.24  ...   0.29     0.08     0.41
19                      WCPM   0.16       0.12  ...   0.12     0.41      0.2
20                       VS%     9%      15.4%  ...  15.6%    21.4%    30.1%
21  Total damage to Champion  11637      11069  ...   9516    12053     3669
22           Physical Damage   6533       9367  ...    166    11214      604
23              Magic Damage   5104        395  ...   9340      755     3065
24               True Damage      0       1307  ...     10       84        0
25                       DPM    474        451  ...    388      491      149
26                      DMG%  24.1%      22.9%  ...  17.4%      22%     6.7%
27            K+A Per Minute    0.2        0.2  ...   0.61     0.65     0.73
28                       KP%  83.3%      83.3%  ...  65.2%    69.6%    78.3%
29                Solo kills    NaN        NaN  ...    NaN      NaN      NaN
30              Double kills      0          0  ...      1        2        0
31              Triple kills      0          0  ...      0        0        0
32              Quadra kills      0          0  ...      0        0        0
33               Penta kills      0          0  ...      0        0        0
34                     GD@15  -2492      -1117  ...    -21    -1272     -292
35                    CSD@15     -9        -27  ...    -29       -1       -6
36                    XPD@15  -1149      -1627  ...   -191     -287    -1322
37                   LVLD@15     -1         -1  ...      0        0       -1
38   Damage dealt to turrets      0        883  ...   1557     4582      717
39                Total heal   1010       5737  ...   2600     2343     3120
40     Damage self mitigated  16638      10704  ...  16506     5476    11927
41         Time ccing others     26         16  ...     18       26       11
42        Total damage taken  18869      19320  ...  14264    11844     9137

This gets you a nice .csv file:

Bonus: the code also works with the other URL:

import requests
import pandas as pd

res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/tournament-LCS%20Summer%202020/')

df = pd.read_html(res.text, skiprows=[0])
df = pd.concat(df)
print(df)

Prints:

        100 Thieves  S10  NA  18  38.9%  ...  33.3  1976  3.0  1.23  1.35
0               CLG  S10 NaN  19  26.3%  ...  32.6  1790  3.3  1.21  1.30
1            Cloud9  S10 NaN  18  72.2%  ...  33.4  1971  3.0  1.12  1.30
2          Dignitas  S10 NaN  19  31.6%  ...  32.7  1590  3.1  1.27  1.33
3     Evil Geniuses  S10 NaN  18  44.4%  ...  32.2  1920  3.3  1.39  1.41
4          FlyQuest  S10 NaN  18  66.7%  ...  32.8  1856  3.3  1.21  1.77
5  Golden Guardians  S10 NaN  18  50.0%  ...  33.8  1992  3.4  1.26  1.53
6         Immortals  S10 NaN  18  22.2%  ...  31.1  1717  3.3  1.35  1.46
7       Team Liquid  S10 NaN  18  83.3%  ...  33.6  1784  3.4  1.24  1.51
8               TSM  S10 NaN  18  66.7%  ...  32.5  1741  3.2  1.33  1.33