Search code examples
pythonjsonpython-requestspython-re

How to get json string from a website using Python?


I am trying to get image links of the products from the website. I can get image info on some of the products. However, I can't get some of them. In the code URL1 is working but URL2 throws "json.decoder.JSONDecodeError". I think the problem is I cant parse the JSON string. I am not good at regular expression. How can I get JSON string?

Screenshot

Code

import re,json,requests
url1 =  "https://www.trendyol.com/samsung/akilli-smart-air-sihirli-led-tv-televizyon-kumandasi-yerine-tuslu-kumanda-1078-p-43447565?boutiqueId=61&merchantId=384846"
url2 = "https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135"
r = requests.get(url2)
data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?);', r.text).group(1))
images = ['https://www.trendyol.com' + img for img in data['product']['images']]
print(images)

Solution

  • The following regex is a better match for your given urls as it terminates at the end of the nested dictionaries and before the start of the next block.

    import re,json,requests
    
    url1 =  "https://www.trendyol.com/samsung/akilli-smart-air-sihirli-led-tv-televizyon-kumandasi-yerine-tuslu-kumanda-1078-p-43447565?boutiqueId=61&merchantId=384846"
    url2 = "https://www.trendyol.com/samsung/k-ve-m-serisi-uyumlu-led-lcd-tv-akilli-kumandasi-bn59-01259b-p-45735139?boutiqueId=61&merchantId=115135"
    
    for url in [url1, url2]:
        r = requests.get(url)
        data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?\}\});', r.text).group(1))
        images = ['https://www.trendyol.com' + img for img in data['product']['images']]
        print(images)
        print("")
    

    enter image description here