html selenium-webdriver web-scraping scrapy extract

extract hidden links from web page

please check this link https://maroof.sa/businesses.

it is a link for website from which i want to extract links.

for example if you scroll down you would find a name for store "Marwa store" if you click on this card this will redirect you to the store page

now i need to scrap all the links for stores in the page " https://maroof.sa/businesses "

after inspection i found it hidden

i have successful extract the store name but i cant find the link

thanks in advance

import time
from selenium.webdriver.support.select import Select
from selenium.webdriver.common.by import By
from selenium import webdriver
from scrapy import Selector
import csv
driver = webdriver.Chrome()
driver.get(url="https://maroof.sa/businesses")
html = driver.page_source
names = driver.find_elements(By.CSS_SELECTOR , 'div.storeCard')

Solution

It's impossible to get business details from card info, however, it can be build by getting data from request with url part business/search .

Business link can be built by pattern {url}/details/{id} where id can be got from response json object items.

You can get needed response by using Chrome Dev Tools Protocol that is now available in Selenium.

Also site has anti-scrapping mechanism, it doesn't load every time for me, so you need to use proxy / Undetected Selenium / etc. I added some stealth chrome options, but it doesn't help every time to avoid bot detection mechanism (site thinks that I'm a bot even in regular browser, so I think their bot detection is broken).

import json
import time

from selenium import webdriver

options = webdriver.ChromeOptions()
options.set_capability('goog:loggingPrefs', {'performance': 'ALL'})

def enable_stealth():
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-gpu")
    options.add_argument('--disable-blink-features=AutomationControlled')
    options.add_argument('--disable-dev-shm-usage')
    options.add_experimental_option("useAutomationExtension", False)
    options.add_argument("--enable-javascript")
    options.add_argument("--enable-cookies")
    options.add_argument('--disable-web-security')
    options.add_experimental_option("excludeSwitches", ["enable-automation"])

enable_stealth()
driver = webdriver.Chrome(options)
url = "https://maroof.sa/businesses"
driver.get(url)
logs = driver.get_log("performance")
time.sleep(5)
target_url = 'business/search'

def get_links():
    for log in logs:
        message = log["message"]
        if "Network.responseReceived" not in message:
            continue
        params = json.loads(message)["message"].get("params")
        if params is None:
            continue
        response = params.get("response")
        if response is None or target_url not in response["url"]:
            continue
        body = driver.execute_cdp_cmd('Network.getResponseBody', {'requestId': params["requestId"]})
        items = json.loads(body['body'])['items']
        for item in items:
            link = f"{url}/details/{item['id']}"
            print(link)

get_links()