I am trying to get image links of the products from the website. I can get image info on some of the products. However, I can't get some of them. In the code URL1 is working but URL2 throws "json.decoder.JSONDecodeError". I think the problem is I cant parse the JSON string. I am not good at regular expression. How can I get JSON string?
Code
import re,json,requests
url1 = "https://www.trendyol.com/samsung/akilli-smart-air-sihirli-led-tv-televizyon-kumandasi-yerine-tuslu-kumanda-1078-p-43447565?boutiqueId=61&merchantId=384846"
url2 = "https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135"
r = requests.get(url2)
data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?);', r.text).group(1))
images = ['https://www.trendyol.com' + img for img in data['product']['images']]
print(images)
The following regex is a better match for your given urls as it terminates at the end of the nested dictionaries and before the start of the next block.
import re,json,requests
url1 = "https://www.trendyol.com/samsung/akilli-smart-air-sihirli-led-tv-televizyon-kumandasi-yerine-tuslu-kumanda-1078-p-43447565?boutiqueId=61&merchantId=384846"
url2 = "https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135"
for url in [url1, url2]:
r = requests.get(url)
data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?\}\});', r.text).group(1))
images = ['https://www.trendyol.com' + img for img in data['product']['images']]
print(images)
print("")