Search code examples
pythoncsvweb-scrapingextractscreen-scraping

How do I extract entire table and store it in CSV file?


I am trying to scrape entire table and want to store it in .csv file. While I am trying to scrape this data it is showing me error as NO TABLES FOUND.

Here is my code.

from pandas.io.html import read_html
page = 'https://games.crossfit.com/leaderboard/open/2020?view=0&division=1&scaled=0&sort=0'

tables = read_html(page,  attrs={"class":"desktop athletes"})

print ("Extracted {num} tables".format(num=len(tables)))

Any suggestion or guidance or any help ?


Solution

  • As stated below, you can access the api to get the data. To save as CSV, you'll need to work through the json format to get what you need (ie. flatten out the nested data). There's 2 ways to do it, a) completely flatten it out so that each row is for each entrant, or b) have separate rows for each entrant for each of their ordinal scores.

    The only differences will be if you choose a) you'll have a really wide table (but no repeated data), and if you go with b) you'll have a long table, with repeat of data.

    Since it's not too big of a file, I went with option b) so you can always groupby particular columns or filter:

    import requests
    import pandas as pd
    
    r = requests.get('https://games.crossfit.com/competitions/api/v1/competitions/open/2020/leaderboards?view=0&division=1&scaled=0&sort=0')
    
    data = r.json()
    
    results = pd.DataFrame()
    df = pd.DataFrame(data['leaderboardRows'])
    
    for idx, row in df.iterrows():
        entrantData = pd.Series()
        scoresData = pd.DataFrame()
        entrantResults = pd.DataFrame()
        for idx2, each in row.iteritems():
            if type(each) == dict:
                temp = pd.DataFrame.from_dict(each, orient='index')
                entrantData = entrantData.append(temp)
            elif type(each) == list:
                temp2 = pd.DataFrame(each)
                scoresData = scoresData.append(temp2, sort=True).reset_index(drop=True)
            else:
                entrantData = entrantData.append(pd.Series(each, name=idx2))
    
        entrantResults = entrantResults.append(scoresData, sort=True).reset_index(drop=True)
        entrantResults = entrantResults.merge(pd.concat([entrantData.T] *5, ignore_index=True), left_index=True, right_index=True)
    
        results = results.append(entrantResults, sort=True).reset_index(drop=True)
    
    results.to_csv('file.csv', index=False)
    

    Output: first 15 rows of 250

    print (results.head(15).to_string())
                   affiliate affiliateId     affiliateName age                                   breakdown competitorId        competitorName countryChampion countryOfOriginCode countryOfOriginName divisionId drawBlueHR firstName gender heat  height highlight             judge lane     lastName mobileScoreDisplay nextStage  ordinal overallRank overallScore postCompStatus          profilePicS3key rank scaled       score scoreDisplay       scoreIdentifier status time video  weight
    0       CrossFit Nanaimo        1918  CrossFit Nanaimo  30                                   10 rounds       158264       Patrick Vellner           False                  CA              Canada          1        NaN   Patrick      M        71 in     False     Dallyn Giroux           Vellner                                     1           1           64                 d471c-P158264_7-184.jpg   13      0    11800382         8:38  9d3979222412df2842a1    ACT  518     0  195 lb
    1    CrossFit Soul Miami        1918  CrossFit Nanaimo  30                  29 rounds +\n2 thrusters\n       158264       Patrick Vellner           False                  CA              Canada          1        NaN   Patrick      M        71 in     False      Lamar Vernon           Vellner                                     2           1           64                 d471c-P158264_7-184.jpg   19      0  1009880000     988 reps  9bd66b00e8367cc7fd0c    ACT  NaN     0  195 lb
    2       CrossFit Nanaimo        1918  CrossFit Nanaimo  30                                    165 reps       158264       Patrick Vellner           False                  CA              Canada          1        NaN   Patrick      M        71 in     False    Jason Lochhead           Vellner                                     3           1           64                 d471c-P158264_7-184.jpg   12      0  1001650151         6:29  2347b4cb7339f2a13e6c    ACT  389     0  195 lb
    3       CrossFit Nanaimo        1918  CrossFit Nanaimo  30                                    240 reps       158264       Patrick Vellner           False                  CA              Canada          1        NaN   Patrick      M        71 in     False     Dallyn Giroux           Vellner                                     4           1           64                 d471c-P158264_7-184.jpg   18      0  1002400211        16:29  bcfd3882df3fa2e99451    ACT  989     0  195 lb
    4   CrossFit New England        1918  CrossFit Nanaimo  30                                    240 reps       158264       Patrick Vellner           False                  CA              Canada          1        NaN   Patrick      M        71 in     False      Matt O'Keefe           Vellner                                     5           1           64                 d471c-P158264_7-184.jpg    2      0  1002400591        10:09  4bb25bed5f71141da122    ACT  609     0  195 lb
    5        CrossFit Mayhem        3220   CrossFit Mayhem  30                                   10 rounds       153604         Mathew Fraser            True                  US       United States          1        NaN    Mathew      M        67 in     False  darren hunsucker            Fraser                                     1           2           74                 9e218-P153604_4-184.jpg    8      0    11800392         8:28  18b5b2e137f00a2d9d7d    ACT  508     0  195 lb
    6    CrossFit Soul Miami        3220   CrossFit Mayhem  30  28 rounds +\n4 thrusters\n3 toes-to-bars\n       153604         Mathew Fraser            True                  US       United States          1        NaN    Mathew      M        67 in     False      Daniel Lopez            Fraser                                     2           2           74                 9e218-P153604_4-184.jpg   40      0  1009590000     959 reps  b96bc1b7b58fa34a28a1    ACT  NaN     0  195 lb
    7        CrossFit Mayhem        3220   CrossFit Mayhem  30                                    165 reps       153604         Mathew Fraser            True                  US       United States          1        NaN    Mathew      M        67 in     False   Jason Fernandez            Fraser                                     3           2           74                 9e218-P153604_4-184.jpg    3      0  1001650172         6:08  4f4a994a045652c894c5    ACT  368     0  195 lb
    8        CrossFit Mayhem        3220   CrossFit Mayhem  30                                    240 reps       153604         Mathew Fraser            True                  US       United States          1        NaN    Mathew      M        67 in     False   Tasia Percevecz            Fraser                                     4           2           74                 9e218-P153604_4-184.jpg    2      0  1002400338        14:22  1a4a7d8760e72bb12d68    ACT  862     0  195 lb
    9        CrossFit Mayhem        3220   CrossFit Mayhem  30                                    240 reps       153604         Mathew Fraser            True                  US       United States          1        NaN    Mathew      M        67 in     False    Kelley Jackson            Fraser                                     5           2           74                 9e218-P153604_4-184.jpg   21      0  1002400555        10:45  b4a259e7049f47f65356    ACT  645     0  195 lb
    10                   NaN           0                    30                                   10 rounds       514502  Lefteris Theofanidis            True                  GR              Greece          1        NaN  Lefteris      M       171 cm     False               NaN       Theofanidis                                     1           3           94                 931eb-P514502_2-184.jpg    1      0    11800415         8:05  c8907e02512f42ff3142    ACT  485     1   81 kg
    11                   NaN           0                    30                   30 rounds +\n1 thruster\n       514502  Lefteris Theofanidis            True                  GR              Greece          1        NaN  Lefteris      M       171 cm     False               NaN       Theofanidis                                     2           3           94                 931eb-P514502_2-184.jpg    3      0  1010210000    1021 reps  63add31b22606957701c    ACT  NaN     1   81 kg
    12                   NaN           0                    30                                    165 reps       514502  Lefteris Theofanidis            True                  GR              Greece          1        NaN  Lefteris      M       171 cm     False               NaN       Theofanidis                                     3           3           94                 931eb-P514502_2-184.jpg   13      0  1001650148         6:32  46d7cdb691c25ea38dbe    ACT  392     1   81 kg
    13                   NaN           0                    30                                    240 reps       514502  Lefteris Theofanidis            True                  GR              Greece          1        NaN  Lefteris      M       171 cm     False               NaN       Theofanidis                                     4           3           94                 931eb-P514502_2-184.jpg    4      0  1002400300        15:00  d49e55a2af5840740071    ACT  900     1   81 kg
    14                   NaN           0                    30                                    240 reps       514502  Lefteris Theofanidis            True                  GR              Greece          1        NaN  Lefteris      M       171 cm     False               NaN       Theofanidis                                     5           3           94                 931eb-P514502_2-184.jpg   73      0  1002400529        11:11  d35c9d687eb6b72c8e36    ACT  671     1   81 kg