Search code examples
pythoncsvweb-scrapingbeautifulsoupexport-to-csv

Web Scraping to .csv


I have been using the following script to scrape some data from a website and export to .csv file:

import requests
from bs4 import BeautifulSoup
import pandas as pd

res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/tournament-LCS%20Summer%202020/')

soup = BeautifulSoup(res.text, 'html.parser')

table = soup.find("table", class_="table_list playerslist tablesaw trhover")

columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]

data = []

table.find("thead").extract()

for tr in table.find_all("tr"):
    data.append([td.get_text(strip=True) for td in tr.find_all("td")])

df = pd.DataFrame(data, columns=columns)

df.to_csv("S10-NA-AVGs.csv", index=False)

I am having issues with trying this same script trying to collect other data and export to .csv. The website in question is: https://gol.gg/game/stats/25989/page-fullstats/

I understand that the data is laid out differently in the html code and that is where I am a little mixed up in what it is looking for to grab. It seems to be a where the individual fields are stored so I tried to change this line around:

columns = [i.get_text(strip=True) for i in table.find("thead").find_all("th")]

That is where I am receiving the error message:

AttributeError: 'NoneType' object has no attribute 'find'

I tried changing to "th" and "thead" to a few different variations but was unsuccessful.


Solution

  • How about using pandas to get all the job done, since you already use it?

    import requests
    import pandas as pd
    
    res = requests.get('https://gol.gg/game/stats/25989/page-fullstats/')
    
    df = pd.read_html(res.text, skiprows=[0])
    df = pd.concat(df)
    df.to_csv("data.csv", index=False)
    print(df)
    
    

    Output:

    [                      Player   Huni Svenskeren  ...  Ryoma Cody Sun    Poome
    0                       Role    TOP     JUNGLE  ...    MID      ADC  SUPPORT
    1                      Kills      2          0  ...      5        4        2
    2                     Deaths      5          6  ...      2        2        1
    3                    Assists      3          5  ...     10       12       16
    4                        KDA      1        0.8  ...    7.5        8       18
    5                         CS    186        136  ...    210      217       27
    6        CS in Team's Jungle      4         80  ...      8        8        0
    7         CS in Enemy Jungle      0          0  ...      0        6        0
    8                        CSM    7.6        5.5  ...    8.6      8.8      1.1
    9                      Golds   8723       7059  ...  11074    11275     7255
    10                       GPM    355        288  ...    451      459      296
    11                     GOLD%  21.9%      17.7%  ...  20.5%    20.8%    13.4%
    12              Vision Score     14         24  ...     27       37       52
    13              Wards placed      7          7  ...      9        9       34
    14           Wards destroyed      4          3  ...      3       10        5
    15   Control Wards Purchased      0          6  ...      7        2       10
    16                      VSPM   0.57       0.98  ...    1.1     1.51     2.12
    17                       WPM   0.29       0.29  ...   0.37     0.37     1.38
    18                      VWPM      0       0.24  ...   0.29     0.08     0.41
    19                      WCPM   0.16       0.12  ...   0.12     0.41      0.2
    20                       VS%     9%      15.4%  ...  15.6%    21.4%    30.1%
    21  Total damage to Champion  11637      11069  ...   9516    12053     3669
    22           Physical Damage   6533       9367  ...    166    11214      604
    23              Magic Damage   5104        395  ...   9340      755     3065
    24               True Damage      0       1307  ...     10       84        0
    25                       DPM    474        451  ...    388      491      149
    26                      DMG%  24.1%      22.9%  ...  17.4%      22%     6.7%
    27            K+A Per Minute    0.2        0.2  ...   0.61     0.65     0.73
    28                       KP%  83.3%      83.3%  ...  65.2%    69.6%    78.3%
    29                Solo kills    NaN        NaN  ...    NaN      NaN      NaN
    30              Double kills      0          0  ...      1        2        0
    31              Triple kills      0          0  ...      0        0        0
    32              Quadra kills      0          0  ...      0        0        0
    33               Penta kills      0          0  ...      0        0        0
    34                     GD@15  -2492      -1117  ...    -21    -1272     -292
    35                    CSD@15     -9        -27  ...    -29       -1       -6
    36                    XPD@15  -1149      -1627  ...   -191     -287    -1322
    37                   LVLD@15     -1         -1  ...      0        0       -1
    38   Damage dealt to turrets      0        883  ...   1557     4582      717
    39                Total heal   1010       5737  ...   2600     2343     3120
    40     Damage self mitigated  16638      10704  ...  16506     5476    11927
    41         Time ccing others     26         16  ...     18       26       11
    42        Total damage taken  18869      19320  ...  14264    11844     9137
    

    This gets you a nice .csv file:

    enter image description here

    Bonus: the code also works with the other URL:

    import requests
    import pandas as pd
    
    res = requests.get('https://gol.gg/teams/list/season-ALL/split-ALL/tournament-LCS%20Summer%202020/')
    
    df = pd.read_html(res.text, skiprows=[0])
    df = pd.concat(df)
    print(df)
    
    

    Prints:

            100 Thieves  S10  NA  18  38.9%  ...  33.3  1976  3.0  1.23  1.35
    0               CLG  S10 NaN  19  26.3%  ...  32.6  1790  3.3  1.21  1.30
    1            Cloud9  S10 NaN  18  72.2%  ...  33.4  1971  3.0  1.12  1.30
    2          Dignitas  S10 NaN  19  31.6%  ...  32.7  1590  3.1  1.27  1.33
    3     Evil Geniuses  S10 NaN  18  44.4%  ...  32.2  1920  3.3  1.39  1.41
    4          FlyQuest  S10 NaN  18  66.7%  ...  32.8  1856  3.3  1.21  1.77
    5  Golden Guardians  S10 NaN  18  50.0%  ...  33.8  1992  3.4  1.26  1.53
    6         Immortals  S10 NaN  18  22.2%  ...  31.1  1717  3.3  1.35  1.46
    7       Team Liquid  S10 NaN  18  83.3%  ...  33.6  1784  3.4  1.24  1.51
    8               TSM  S10 NaN  18  66.7%  ...  32.5  1741  3.2  1.33  1.33