Search code examples
pythonpython-requestspython-requests-html

Retrieving data from ajax request


I try to webscrap data of all past ufc matches from this page https://www.ufc.com/events#events-list-past.

And it has ajax request "load more" so when trying to send request to ajax it returns 404

headers = {'Accept': 'text/html',
           'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/112.0'}
url = 'https://www.ufc.com/views/ajax?_wrapper_format=drupal_ajax'
page = requests.get(url, headers=headers)
print(page.status_code)

How can I load all these matches?


Solution

  • You forgot to send the payload along the request:

    import requests
    from bs4 import BeautifulSoup
    
    api_url = 'https://www.ufc.com/views/ajax?_wrapper_format=drupal_ajax'
    
    payload = {
        "view_name": "events_upcoming_past_solr",
        "view_display_id": "past",
        "view_args": "",
        "view_path": "/events",
        "view_base_path": "",
        "view_dom_id": "8177974293330a03393b98761bd5673fd3b427b60e6d7c5b5b54a6ca34144791",
        "pager_element": "0",
        "page": 1,
        "_drupal_ajax": "1",
        "ajax_page_state[theme]": "ufc",
        "ajax_page_state[theme_token]": "",
        "ajax_page_state[libraries]": "addtoany/addtoany,classy/base,classy/messages,core/normalize,core/picturefill,dve/video,field_group/element.horizontal_tabs,google_analytics/google_analytics,layout_discovery/onecol,poll/drupal.poll-links,system/base,ufc/buttons,ufc/card-content-footer,ufc/card-event,ufc/carousel,ufc/filter-select,ufc/global-css,ufc/global-js,ufc/greensock-tweenmax,ufc/hero,ufc/how-to-watch-event,ufc/how-to-watch-event-group,ufc/in-action,ufc/links-dropdown,ufc/menu-main,ufc/page,ufc/sidebar-first,ufc/site-footer,ufc/site-header,ufc/tabs,ufc_events/eventsList,ufc_localization/region,ufc_localization/timezoner,ufc_yext/search_bar,views/views.ajax,views/views.module,views_infinite_scroll/views-infinite-scroll"
    }
    
    for payload['page'] in range(0, 4):  # <-- increase page numbers here
        data = requests.post(api_url, data=payload).json()
        soup = BeautifulSoup(data[-1]['data'], 'html.parser')
        # print some data from the response:
        for title in soup.select('h3'):
            print(title.text.strip())
    

    Prints:

    Pavlovich vs Blaydes
    Holloway vs Allen
    Pereira vs Adesanya 2
    Vera vs Sandhagen
    Edwards vs Usman 3
    Yan vs Dvalishvili
    Jones vs Gane
    Muniz vs Allen
    Andrade vs Blanchfield
    Makhachev vs Volkanovski
    Lewis vs Spivac
    Teixeira vs Hill
    Strickland vs Imavov
    Cannonier vs Strickland
    Blachowicz vs Ankalaev
    Thompson vs Holland
    Nzechukwu vs Cutelaba
    Adesanya vs Pereira
    Rodriguez vs Lemos
    Kattar vs Allen
    Oliveira vs Makhachev
    Grasso vs Araujo
    Dern vs Yan
    
    ...