Search code examples
pythonweb-scrapingbeautifulsoup

How to Scrape Dynamically-Loaded Store URLs from Web Page


I'm working on a web scraping project and I'm attempting to extract a list of store URLs from the following page: https://maroof.sa/businesses. Here are the methods I've tried thus far, but without success:

Using BeautifulSoup and Requests in Python to parse the HTML, but I was unable to locate the correct tags/classes that contain the store URLs. Employing Selenium to wait for JavaScript to render the store links and then extract them, but the appropriate elements


Solution

  • You can try:

    import json
    import requests
    
    
    url = "https://api.thiqah.sa/maroof/public/api/app/business/search?keyword=&businessTypeId=&businessTypeSubCategoryId=&regionId=&cityId=&certificationType=&sortBy=2&sortDirection=2&sorting=&skipCount=0&maxResultCount=10"
    headers = {"apikey": "c1qesecmag8GSbxTHGRjfnMFBzAH7UAN"}
    
    data = requests.get(url, headers=headers).json()
    
    # print(json.dumps(data, indent=4))
    
    for i in data["items"]:
        print(i["nameAr"])
        print(f"https://maroof.sa/businesses/details/{i['id']}")
        print()
    

    Prints:

    متجر بياض
    https://maroof.sa/businesses/details/229217
    
    متجر مروه
    https://maroof.sa/businesses/details/48066
    
    المتسوقة مآثر
    https://maroof.sa/businesses/details/168551
    
    متجر أحمد للأنظمة الصوتية والألكترونيات
    https://maroof.sa/businesses/details/25650
    
    متعب للأرقام المميزة
    https://maroof.sa/businesses/details/253838
    
    متجر شوب تتش
    https://maroof.sa/businesses/details/246531
    
    مؤسسة ماجد عبدالله الشهراني للمقاولات
    https://maroof.sa/businesses/details/244174
    
    شركة إنجاز للخدمات 
    https://maroof.sa/businesses/details/276939
    
    موقع عاملتي الرقمي
    https://maroof.sa/businesses/details/261892
    
    مندوبكم
    https://maroof.sa/businesses/details/112807
    

    EDIT: Screenshot:

    enter image description here