Search code examples
pythonjsonextractscreen-scraping

How to extract information from a scraped JSON string (Python)?


I am trying to scrape some data from FotMob (a football website), but when accessing the HTML with requests and beautiful soup it returns a huge string of text which looks like it is in the form of a json. An extract is shown below:

{"id":9902,"teamId":9902,"nameAndSubstatValue":{"name":"Ipswich Town","substatValue":10},"statValue":"5.2","rank":13,"type":"teams","statFormat":"fraction","substatFormat":"number"},{"id":8283,"teamId":8283,"nameAndSubstatValue":{"name":"Barnsley","substatValue":5},"statValue":"5.2","rank":14,"type":"teams","statFormat":"fraction","substatFormat":"number"}

The code I used to get this is shown here:

url = "https://www.fotmob.com/leagues/108/stats/season/17835/teams/expected_goals_team/league-one-teams"
r=requests.get(url)
html_doc = r.text
soup = BeautifulSoup(html_doc)
for p in soup.find_all('script',attrs={'id':'__NEXT_DATA__'}):
print(p.text)

Specifically I want to access the stat_value, name and substatValue and put these into a pandas data frame. Does anyone know how to do this?


Solution

  • Use json.loads to parse the data:

    import json
    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    url = "https://www.fotmob.com/leagues/108/stats/season/17835/teams/expected_goals_team/league-one-teams"
    r = requests.get(url)
    html_doc = r.text
    soup = BeautifulSoup(html_doc, "html.parser")
    
    data = json.loads(soup.find("script", attrs={"id": "__NEXT_DATA__"}).text)
    
    d = data["props"]["pageProps"]["initialState"]["leagueSeasonStats"]["statsData"]
    df = pd.DataFrame(d)
    df = pd.concat([df, df.pop("nameAndSubstatValue").apply(pd.Series)], axis=1)
    
    print(df)
    

    Prints:

           id  teamId statValue  rank   type statFormat substatFormat                 name  substatValue
    0    8462    8462       8.9     1  teams   fraction        number           Portsmouth            12
    1    8451    8451       7.3     2  teams   fraction        number    Charlton Athletic             9
    2    9792    9792       7.3     3  teams   fraction        number        Burton Albion             4
    3    8671    8671       6.3     4  teams   fraction        number   Accrington Stanley             8
    4    9833    9833       6.2     5  teams   fraction        number          Exeter City             9
    5   10170   10170       6.1     6  teams   fraction        number         Derby County             3
    6    8677    8677       5.9     7  teams   fraction        number  Peterborough United            12
    7    8401    8401       5.8     8  teams   fraction        number      Plymouth Argyle             8
    8    8559    8559       5.7     9  teams   fraction        number     Bolton Wanderers             5
    9    8676    8676       5.3    10  teams   fraction        number    Wycombe Wanderers             8
    10  10163   10163       5.3    11  teams   fraction        number  Sheffield Wednesday             7
    11   8680    8680       5.3    12  teams   fraction        number      Cheltenham Town             3
    12   9902    9902       5.2    13  teams   fraction        number         Ipswich Town            10
    13   8283    8283       5.2    14  teams   fraction        number             Barnsley             5
    14   8653    8653       5.0    15  teams   fraction        number        Oxford United             3
    15   9799    9799       4.3    16  teams   fraction        number            Port Vale             5
    16  45723   45723       4.3    17  teams   fraction        number       Fleetwood Town             4
    17   9828    9828       4.0    18  teams   fraction        number  Forest Green Rovers             4
    18   9896    9896       3.7    19  teams   fraction        number      Shrewsbury Town             2
    19   9834    9834       3.5    20  teams   fraction        number     Cambridge United             5
    20  10104   10104       3.2    21  teams   fraction        number       Bristol Rovers             7
    21   8430    8430       2.9    22  teams   fraction        number         Lincoln City             4
    22   8489    8489       2.6    23  teams   fraction        number            Morecambe             2
    23   8645    8645       2.2    24  teams   fraction        number   Milton Keynes Dons             3