I am trying to get JSON data of products from the website. The code worked for around 400 - 500 products. But the one which is in the screenshot gives "AttributeError: 'NoneType' object has no attribute 'group'" error for this product. I think the problem occurring because of double quotation. I could not escape from it. I tried (\ ") that one. But it still throws the error. How can I fix it?
import re,json,requests
r = requests.get("https://www.trendyol.com/xiaomi/64mp-note-9-pro-6gb-64gb-6-67-yesil-akilli-cep-telefonu-p-58882069")
data = json.loads(re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__=(.*?\}\});', r.text).group(1))
The regular expression you were using is not matching the actual JavaScript source in the file.
With
re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__ = ({.*\}\});', r.text)
or, better
re.search(r'PRODUCT_DETAIL_APP_INITIAL_STATE__[\s]*=[\s]*({.*\}\})[\s]*;', r.text)
you will match the actual beginning of the JSON which is
window.__PRODUCT_DETAIL_APP_INITIAL_STATE__ = {"product":{"attributes":[{"k
^^^^
with spaces around the =
.
Usage of HTML parsing or Selenium seems overdone for this use case, because you are anyway hacking into something that was never designed to be an interface and can change from one day to the other.
Rather, for fiddling with one-off regular expressions, use tooling like https://regex101.com to get right in controlled environment :)