I have been trying to scrape a website that shows every football match that will be played on a specific field. The site is dynamic and I have tried several things to conquer my problem. I tried to use the requests_html library and it renders every HTML element until it gets to <aside>
tag. After that, it just renders "none". Can <aside>
tag prevent web scraping or is my code otherwise flawed?
This is the code I tried to use.
from requests_html import HTMLSession
url = "https://tulospalvelu.palloliitto.fi/location/598/fixtures"
s = HTMLSession()
r = s.get(url)
r.html.render(sleep=1)
fixtures = r.html.xpath('//*[@id="scrolldiv"]/main/div/aside', first=True)
print(fixtures)
None
Process finished with exit code 0
When I use a different xpath that isn't inside the <aside>
tag it renders the element properly:
from requests_html import HTMLSession
url = "https://tulospalvelu.palloliitto.fi/location/598/fixtures"
s = HTMLSession()
r = s.get(url)
r.html.render(sleep=1)
fixtures = r.html.xpath('//*[@id="scrolldiv"]/main/div', first=True)
print(fixtures)
<Element 'div' class=('v-main__wrap',)>
Process finished with exit code 0
Is there something I can do to avoid this issue?
To get the data into panda's dataframe you can use next example:
import requests
import pandas as pd
api_url = 'https://spl.torneopal.net/taso/rest/getMatches?start_date=2022-01-01&venue_location_id=598'
headers = {'Accept': 'json/df8e84j9xtdz269euy3h'}
data = requests.get(api_url, headers=headers).json()
df = pd.DataFrame(data['matches'])
print(df.head())
Prints:
match_id match_number match_report match_external_id age_group aggregate_match competition_id competition_name competition_result_service_name competition_logo competition_status competition_officiality season_id organiser organiser_private category_id category_name category_logo category_logo_dark_bg category_group category_group_name category_group_name_en category_live calculate_points season_order group_id group_name phase_number phase_name group_type alternative_groups match_type round_id round_name round_date_begin round_date_end round_team_id round_club_id round_club_name stage stage_name standings_name date time reschedule time_reservation_start time_end time_zone time_zone_offset sunset club_A_id club_A_name club_A_abbrevation club_A_www club_A_crest team_A_id team_A_name team_A_abbrevation team_A_description team_A_description_en team_A_www club_A_distance team_A_home_venue_id team_A_primary_category_id club_B_id club_B_name club_B_abbrevation club_B_www club_B_distance club_B_crest team_B_id team_B_name team_B_abbrevation team_B_description team_B_description_en team_B_www team_B_home_venue_id team_B_primary_category_id statistics_level status forfeit_A forfeit_B disqualify_A disqualify_B winner winner_id walkover fs_A fs_B hts_A hts_B ns_A ns_B es_A es_B ps_A ps_B suspensions_A suspensions_B suspensions_officials_A suspensions_officials_B best_of_match best_of_count can_start_live lineups_filled best_of_sequence track_scorers track_assists starting_players sport_id match_report_exists p1_start_team p1_start_time p1_end_time p1_duration p1s_A p1s_B p1_winner p2_start_team p2_start_time p2_end_time p2_duration p2s_A p2s_B p2_winner p3_start_team p3_start_time p3_end_time p3_duration p3s_A p3s_B p3_winner p4_start_team p4_start_time p4_end_time p4_duration p4s_A p4s_B p4_winner p5_start_team p5_start_time p5_end_time p5_duration p5s_A p5s_B p5_winner position_name players_aside serve_order players_substitutes substitutions max_fouls max_timeouts connected_matches referee_1_id referee_1_name referee_1_player_id referee_2_id referee_2_name referee_2_player_id assistant_referee_1_id assistant_referee_1_name assistant_referee_1_player_id assistant_referee_2_id assistant_referee_2_name assistant_referee_2_player_id referee_1_join referee_2_join assistant_referee_1_join assistant_referee_2_join venue_id venue_name venue_name_competition venue_city_name venue_city_id venue_area_id venue_referee_club_id venue_area_name venue_location_id venue_location_name venue_suburb_name attendance report_attendance referee_classification assistant_referee_classification report_result playing_time_min period_count win_period_count period_count_fixed period_min extra_period_count extra_period_min ps_count live_period live_time live_time_mmss live_minutes live_A live_B live_ps_A live_ps_B live_timeouts_A live_timeouts_B live_timeout_skip_A live_timeout_skip_B live_serve_team live_fouls_A live_fouls_B live_timer_start live_timer_start_time live_timer_on temperature weather stream_url ticket_url stream stream_media stream_media_name stream_img tv notice result_notice stamp fourth_official_id fourth_official_name fourth_official_player_id assign_fourth_official delegate_1_id delegate_1_name assign_delegate_1 delegate_2_id delegate_2_name assign_delegate_2 timestamp
0 2331269 92 1952 M spljp22 SPL Jalkapallo 2022 archived official 2022 spl 0 MSC Miesten Suomen Cup https://cdn.torneopal.net/img/palloliitto/spl/MSCblue.png 1 Miehet 1 1 66 7 1. Kierros 9 knockout_final [] single 4 Kori 4 2022-01-29 2022-03-09 alku 2022-02-20 12:00:00 0 12:00:00 14:00:00 Europe/Helsinki +0200 17:24 156 Koivukylän Palloseura KoiPS https://cdn.torneopal.net/logo/palloliitto/156x.png 130806 KoiPS/Dynamo G/141 KoiPS/Dynamo M5 151 Kultsu FC Kultsu FC https://cdn.torneopal.net/logo/palloliitto/151x.png 60503 Kultsu FC G/142 Kultsu FC 2788 M3 Played Away 60503 0 1 6 0 3 0 0 1 1 football 16014 Sirviö Ville 17329 Sorvanto Mikael 17586 Nassar Saleh 600 Tikkurila TN Vantaa 92 116 ETE Pääkaupunkiseutu 598 Tikkurila 20 M2 M2 0 120 2 2 0 45 5 -1 00:00 0000-00-00 00:00:00 00:00:00 0 1682855815
1 2353548 10 8519 M spljp22 SPL Jalkapallo 2022 archived official 2022 spl 0 MSC Miesten Suomen Cup https://cdn.torneopal.net/img/palloliitto/spl/MSCblue.png 1 Miehet 1 1 66 4 Tasaus 10 knockout_final [] single 2 Kori 2 2022-01-29 2022-02-20 alku 2022-02-20 20:00:00 0 20:00:00 22:00:00 Europe/Helsinki +0200 17:24 5176 Nikinmäki United Nikinmäki United https://cdn.torneopal.net/logo/palloliitto/5176x.png 183596 Nikinmäki United Tasaus/19 Nikinmäki United 2508 M5 34 Esbo Bollklubb EBK https://cdn.torneopal.net/logo/palloliitto/34x.png 63806 EBK/Reservi Tasaus/20 EBK/Reservi 221 M5 Played Away 63806 0 0 3 0 1 0 0 1 1 football 12425 Karttunen Jani 12535 Kallio Sampo 15586 Selvitys pyyntö 600 Tikkurila TN Vantaa 92 116 ETE Pääkaupunkiseutu 598 Tikkurila 0 M2 M2 0 120 2 2 0 45 5 -1 00:00 0000-00-00 00:00:00 00:00:00 0 1682855815
2 2331204 54 1887 M spljp22 SPL Jalkapallo 2022 archived official 2022 spl 0 MSC Miesten Suomen Cup https://cdn.torneopal.net/img/palloliitto/spl/MSCblue.png 1 Miehet 1 1 66 7 1. Kierros 9 knockout_final [] single 2 Kori 2 2022-01-29 2022-03-09 alku 2022-03-11 20:05:00 0 20:05:00 22:05:00 Europe/Helsinki +0200 18:13 219 FC Korso FC Korso https://cdn.torneopal.net/logo/palloliitto/219x.png 63417 FC Korso/United G/55 FC Korso/United 568 M5 190 Suurmetsän Urheilijat SUMU SUMU https://cdn.torneopal.net/logo/palloliitto/190x.png 53539 SUMU/sob G/56 SUMU/sob 368 M4 Played Away 53539 0 0 2 0 0 1 1 football 10497 Juvonen Elias 12468 Juntto Joni 15748 Lahtonen Janne 600 Tikkurila TN Vantaa 92 116 ETE Pääkaupunkiseutu 598 Tikkurila 0 M2 M2 0 120 2 2 0 45 5 -1 00:00 0000-00-00 00:00:00 00:00:00 0 1682855815
3 2355504 9405 P14 etejp22 Etelä Jalkapallo 2022 archived official 2022 spletela 0 P141 P14 Ykkönen 3 Pojat 1 1 246 1 Kevät 5 Tammikuu group_stage [] single 1 2022-03-19 2022-03-20 alku 2022-03-20 13:10:00 0 13:10:00 14:40:00 Europe/Helsinki +0200 18:35 156 Koivukylän Palloseura KoiPS https://cdn.torneopal.net/logo/palloliitto/156x.png 158370 KoiPS Kevät/4 KoiPS P152 162 Hyvinkään Palloseura HyPS https://cdn.torneopal.net/logo/palloliitto/162x.png 170718 HyPS Kevät/8 HyPS 977 P141 Played Away 170718 0 1 2 1 1 0 0 1 0 football 18163 Saarela DANIEL CHRISTIAN 20799 Mattsson Rene 16012 Heino Viljami 600 Tikkurila TN Vantaa 92 116 ETE Pääkaupunkiseutu 598 Tikkurila 0 P7 P7 1 90 2 2 0 40 -1 00:00 0000-00-00 00:00:00 00:00:00 0 1682855815
4 2318289 3390 P14 etejp22 Etelä Jalkapallo 2022 archived official 2022 spletela 0 P14LE P14 Liiga Etelä 3 Pojat 1 1 238 1 1 3 Lokakuu group_stage [] single 2 2022-03-26 2022-03-26 alku 2022-03-26 10:30:00 0 10:30:00 12:00:00 Europe/Helsinki +0200 18:50 59 Tikkurilan Palloseura TiPS https://cdn.torneopal.net/logo/palloliitto/59x.png 157198 TiPS 1/9 TiPS P15LE 63 Pallokerho Keski-Uusimaa PKKU https://cdn.torneopal.net/logo/palloliitto/63x.png 168477 PKKU 1/1 PKKU 561 P15LE Played Away 168477 0 1 2 0 1 0 0 1 0 football 12989 Vanninen Arto 18332 Kuopio Juuso 12838 Raumala Petri 600 Tikkurila TN Vantaa 92 116 ETE Pääkaupunkiseutu 598 Tikkurila 0 P5 P5 1 90 2 2 0 40 -1 00:00 0000-00-00 00:00:00 00:00:00 0 1682855815