Here is what I'm doing
import requests
from requests.adapters import HTTPAdapter
from bs4 import BeautifulSoup
HEADERS = {
'authority': 'www.noon.com',
'pragma': 'no-cache',
'cache-control': 'no-cache',
'dnt': '1',
'upgrade-insecure-requests': '1',
'accept': '*/*',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36',
'sec-fetch-site': 'none',
'sec-fetch-mode': 'navigate',
'sec-fetch-dest': 'document'
}
response = requests.get('https://www.noon.com/uae-en/electronics-and-mobiles/mobiles-and-accessories/mobiles-20905',headers=HEADERS,stream=True)
soup = BeautifulSoup(response.content,'lxml')
results = soup.find_all("div", {"class" : "productContainer"})
result = results[0]
print("https://www.noon.com" + result.a.get('href'))
Output
https://www.noon.com/uae-en
But the expected output should be 'https://www.noon.com/uae-en/product/N35521717A/p?o=f885efe0b6534e9f'
As here you can see from the browser
<div class="productContainer"><a class="sc-7vj7do-0 ftlAjW" href="/uae-en/product/N35521717A/p?o=f885efe0b6534e9f" id="productBox-N35521717A"><div class="kcs0h5-0 diNcmV grid" title="Samsung Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE "><div class="e3js0d-1 efqIDW"><div class="productImage" data-qa-id="productImagePLP_Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE "><div class="lazyload-wrapper"><div class="puv25r-0 hfEfTS"><div class="puv25r-2 hJKuPa"><img alt="Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE " src="https://a.nooncdn.com/t_desktop-pdp-v1/v1605814225/N35521717A_1.jpg"/></div></div></div></div><div class="e3js0d-2 dqjnoR"><div class="tagContainer"></div></div></div><div class="e3js0d-6 iKEZJh"><div class="e3js0d-7 jULUCI"><div class="e3js0d-10 cyUANN"><span class="e3js0d-11 gXshOX">Samsung</span>Galaxy M31 Dual SIM Blue 6GB RAM 128GB 4G LTE </div></div><div class="e3js0d-8 jtiosv"><div class="sc-3751lm-0 hSumnU"><div class="sc-3751lm-1 eUJkVt large"><span class="currency">AED</span><strong>819.00</strong></div><div class="sc-3751lm-2 kWnsOk"><span class="oldPrice">AED<!-- --> <!-- -->859</span></div></div></div><div class="e3js0d-9 kDpjlW"><div class="e3js0d-12 gMFqig"><div class="u8zs36-0 kRPdZJ"><img alt="noon-express" height="20px" src="https://a.nooncdn.com/s/app/com/noon/images/fulfilment_express-en.png" width="80px"/></div></div></div></div></div></a></div>
What happens and steps to reproduce
Website seems to deal with dynamically generated content.
Open the website in browser
Open source code ctrl + u
search for class="productContainer"
and you will see the href
of <a>
only contains /uae-en
-> That is what you get by using requests
Open inspector ctrl+shift+i
and inspect your <a>
and you will find the dynamically added part, what you get if you use selenium.
Minimal example
import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
browser = webdriver.Chrome('C:\Program Files\ChromeDriver\chromedriver.exe')
actions = ActionChains(browser)
browser.get('https://www.noon.com/uae-en/electronics-and-mobiles/mobiles-and-accessories/mobiles-20905')
time.sleep(3)
element = browser.find_element_by_xpath("//div[contains(@class, 'productContainer')]/a")
actions.move_to_element(element).perform()
print(element.get_attribute('href'))
browser.close()
Output
https://www.noon.com/uae-en/product/N35521717A/p?o=f885efe0b6534e9f
https://www.noon.com/uae-en/product/N41247213A/p?o=ca38c8921770ea2a
https://www.noon.com/uae-en/product/N41247235A/p?o=c97b8bfdc0114cba
https://www.noon.com/uae-en/product/N39790555A/p?o=d7354e20a0bb00ad
https://www.noon.com/uae-en/product/N32046052A/p?o=faea2e69f38bbf6a
...
EDIT
You wont get the information with requests
by scraping the source, but there is an alternativ way.
You could use the api with requests
and build the link (simple example you can customize):
import requests
url = "https://www.noon.com/_svc/catalog/api/u/electronics-and-mobiles/mobiles-and-accessories/mobiles-20905"
headers = {
"user-agent": "Mozilla/5.0"
}
response = requests.get(url, headers=headers)
response.raise_for_status()
records = response.json()["hits"]
for record in records:
offer_code = record["offer_code"]
sku = record["sku"]
url = record["url"]
print(f"https://www.noon.com/uae-en/{url}/{sku}/p?o={offer_code}")
Output
https://www.noon.com/uae-en/galaxy-m31-dual-sim-blue-6gb-ram-128gb-4g-lte/N35521717A/p?o=f885efe0b6534e9f
https://www.noon.com/uae-en/iphone-12-pro-max-with-facetime-128gb-pacific-blue-5g-international-specs/N41247213A/p?o=ca38c8921770ea2a
https://www.noon.com/uae-en/iphone-12-pro-with-facetime-256gb-pacific-blue-5g-international-specs/N41247235A/p?o=cfab59c09cab747b
...