Search code examples
pythonweb-scrapingbeautifulsoupdropdownhtml-select

Scraping data from a site where the URL doesn't change while changing options in a drop-down list


I'm using BeautifulSoup to scrape the table Antwerp Weather History for 1 April 2017 in this webpage. But I don't only need this date, I need all days in April 2017, which are in a drop-down list: enter image description here

In the inspector, it's a select tag with options like these:

enter image description here

I could get the values of them with the next code:

prefix = 'https://www.timeanddate.com'
weather_request = requests.get(prefix + '/weather/belgium/antwerp/historic?month=4&year=2017', 
                       'html.parser')
weather = BeautifulSoup(weather_request.content)

for option in weather.select('select > option'):
     append_to_mylist(option.get('value'), option.text)

Would you help me, how to scrape the tables beyond these values, as the URL doesn't change while changing the option from the drop-down list?

I've found some other similar questions but weren't about BeautifulSoup


Solution

  • The data is loaded via Ajax from other URL. The data returned is not Json, but raw Javascript, so some preprocessing is needed to parse it correctly.

    For example:

    import re
    import json
    import requests
    import pandas as pd
    from bs4 import BeautifulSoup
    
    
    for day in range(1, 31):
        print('Getting info for day {}..'.format(day))
        url = 'https://www.timeanddate.com/scripts/cityajax.php?n=belgium/antwerp&mode=historic&hd=201704{:02d}&month=4&year=2017&json=1'.format(day)
    
        data = requests.get(url).text
        data = json.loads(re.sub(r'(c|h|s):', r'"\1":', data))
    
        # uncomment this to print raw data:
        # print(json.dumps(data, indent=4))
    
        # construct the table from json:
        table = '<table>'
        for row in data:
            table += '<tr>'
            for cell in row['c']:
                table += '<td>' + BeautifulSoup(cell['h'], 'html.parser').get_text(strip=True, separator=' ') + '</td>'
            table += '</tr>'
        table += '</table>'
    
        # now in `table` is HTML table, you can parse it with BeautifulSoup, or pass it to Pandas:
        df = pd.read_html(table)[0]
        print(df)
        print('-' * 120)
    

    Prints:

    Getting info for day 1..
                          0   1      2                            3      4  5     6          7      8
    0   12:20 am Sat, Apr 1 NaN  50 °F                       Clear.  2 mph  ↑   94%  29.92 "Hg   2 mi
    1              12:50 am NaN  46 °F                         Fog.  2 mph  ↑  100%  29.92 "Hg   2 mi
    2               1:20 am NaN  48 °F                   Light fog.  3 mph  ↑   87%  29.89 "Hg   0 mi
    3               1:50 am NaN  48 °F                       Clear.  3 mph  ↑   94%  29.89 "Hg   1 mi
    4               2:20 am NaN  46 °F                         Fog.  5 mph  ↑  100%  29.89 "Hg   1 mi
    5               3:20 am NaN  46 °F                       Clear.  3 mph  ↑   93%  29.89 "Hg   1 mi
    6               3:50 am NaN  46 °F                         Fog.  6 mph  ↑   93%  29.86 "Hg   1 mi
    7               4:20 am NaN  46 °F                         Fog.  3 mph  ↑  100%  29.86 "Hg   1 mi
    8               4:50 am NaN  46 °F                         Fog.  3 mph  ↑  100%  29.86 "Hg   1 mi
    9               5:20 am NaN  46 °F                         Fog.  2 mph  ↑   93%  29.86 "Hg   2 mi
    10              5:50 am NaN  48 °F                       Clear.  3 mph  ↑   87%  29.86 "Hg   4 mi
    11              6:20 am NaN  48 °F                       Clear.  5 mph  ↑   87%  29.83 "Hg   4 mi
    12              6:50 am NaN  48 °F                       Clear.  5 mph  ↑   94%  29.86 "Hg   4 mi
    13              7:20 am NaN  50 °F            Sprinkles. Clear.  6 mph  ↑   94%  29.86 "Hg   4 mi
    14              7:50 am NaN  52 °F    Sprinkles. Broken clouds.  9 mph  ↑   88%  29.86 "Hg   3 mi
    15              8:20 am NaN  52 °F    Light rain. Partly sunny.  8 mph  ↑   88%  29.86 "Hg   5 mi
    16              8:50 am NaN  52 °F  Light rain. Passing clouds.  6 mph  ↑   94%  29.86 "Hg   5 mi
    17              9:20 am NaN  52 °F       Drizzle. Partly sunny.  5 mph  ↑   94%  29.86 "Hg   5 mi
    18              9:50 am NaN  52 °F               Broken clouds.  5 mph  ↑   94%  29.86 "Hg   5 mi
    19             10:20 am NaN  52 °F               Broken clouds.  6 mph  ↑   94%  29.89 "Hg    NaN
    20             10:50 am NaN  52 °F    Sprinkles. Broken clouds.  8 mph  ↑   94%  29.89 "Hg   5 mi
    21             11:20 am NaN  52 °F                Partly sunny.  5 mph  ↑   94%  29.89 "Hg    NaN
    22             11:50 am NaN  54 °F            Scattered clouds.  2 mph  ↑   88%  29.89 "Hg    NaN
    23             12:20 pm NaN  55 °F            Scattered clouds.  5 mph  ↑   82%  29.89 "Hg    NaN
    24             12:50 pm NaN  55 °F            Scattered clouds.  3 mph  ↑   77%  29.89 "Hg    NaN
    25              1:20 pm NaN  57 °F              Passing clouds.  5 mph  ↑   72%  29.89 "Hg    NaN
    26              1:50 pm NaN  57 °F              Passing clouds.  3 mph  ↑   67%  29.89 "Hg    NaN
    27              2:20 pm NaN  57 °F              Passing clouds.  7 mph  ↑   72%  29.89 "Hg    NaN
    28              2:50 pm NaN  57 °F            Scattered clouds.  3 mph  ↑   72%  29.89 "Hg    NaN
    29              3:20 pm NaN  55 °F    Sprinkles. Broken clouds.  9 mph  ↑   77%  29.89 "Hg   4 mi
    30              3:50 pm NaN  55 °F    Sprinkles. Broken clouds.  3 mph  ↑   77%  29.86 "Hg   5 mi
    31              4:20 pm NaN  55 °F    Sprinkles. Broken clouds.  2 mph  ↑   82%  29.89 "Hg    NaN
    32              4:50 pm NaN  57 °F            Scattered clouds.  2 mph  ↑   77%  29.86 "Hg    NaN
    33              5:20 pm NaN  57 °F            Scattered clouds.  7 mph  ↑   72%  29.89 "Hg    NaN
    34              5:50 pm NaN  55 °F            Scattered clouds.  6 mph  ↑   88%  29.89 "Hg    NaN
    35              6:20 pm NaN  55 °F              Passing clouds.  6 mph  ↑   82%  29.89 "Hg    NaN
    36              6:50 pm NaN  55 °F              Passing clouds.  3 mph  ↑   82%  29.89 "Hg    NaN
    37              7:20 pm NaN  54 °F              Passing clouds.  5 mph  ↑   94%  29.89 "Hg    NaN
    38              7:50 pm NaN  54 °F              Passing clouds.  5 mph  ↑   88%  29.89 "Hg    NaN
    39              8:20 pm NaN  54 °F              Passing clouds.  7 mph  ↑   88%  29.92 "Hg    NaN
    40              8:50 pm NaN  54 °F                       Clear.  7 mph  ↑   88%  29.92 "Hg  10 mi
    41              9:20 pm NaN  54 °F                       Clear.  2 mph  ↑   88%  29.92 "Hg  10 mi
    42              9:50 pm NaN  52 °F                       Clear.  5 mph  ↑   94%  29.92 "Hg  10 mi
    43             10:20 pm NaN  48 °F                       Clear.  2 mph  ↑  100%  29.95 "Hg  10 mi
    44             10:50 pm NaN  52 °F                       Clear.  3 mph  ↑   88%  29.95 "Hg   4 mi
    45             11:20 pm NaN  46 °F                         Fog.  2 mph  ↑   93%  29.95 "Hg   1 mi
    46             11:50 pm NaN  46 °F                       Clear.  3 mph  ↑   93%  29.95 "Hg   0 mi
    ------------------------------------------------------------------------------------------------------------------------
    Getting info for day 2..
                          0   1      2                  3       4  5     6          7      8
    0   12:20 am Sun, Apr 2 NaN  45 °F               Fog.   2 mph  ↑  100%  29.95 "Hg   0 mi
    1              12:50 am NaN  45 °F               Fog.   2 mph  ↑   93%  29.98 "Hg   1 mi
    2               1:20 am NaN  45 °F               Fog.   2 mph  ↑  100%  29.95 "Hg   0 mi
    3               1:50 am NaN  45 °F             Clear.   3 mph  ↑   87%  29.98 "Hg   4 mi
    4               2:20 am NaN  48 °F             Clear.   6 mph  ↑   87%  29.98 "Hg  10 mi
    5               2:50 am NaN  48 °F             Clear.   2 mph  ↑   87%  29.98 "Hg  10 mi
    6               3:20 am NaN  48 °F             Clear.   5 mph  ↑   87%  29.98 "Hg  10 mi
    7               3:50 am NaN  48 °F             Clear.   2 mph  ↑   87%  29.98 "Hg   6 mi
    8               4:50 am NaN  46 °F             Clear.   2 mph  ↑   87%  30.01 "Hg  10 mi
    9               5:20 am NaN  46 °F    Passing clouds.   3 mph  ↑   87%  30.01 "Hg    NaN
    10              5:50 am NaN  46 °F             Clear.   2 mph  ↑   87%  30.01 "Hg  10 mi
    11              6:20 am NaN  46 °F             Clear.   1 mph  ↑   87%  30.04 "Hg   4 mi
    12              6:50 am NaN  45 °F         Light fog.   2 mph  ↑   93%  30.04 "Hg   5 mi
    
    
    ... and so on.