So I'm scraping a website to get its' data. It's a woocomerce website and a product has multiple variations with different prices.
scraping with BeautifulSoup I'm able to get the whole product and variant information but some strings are unreadable.
product_page = requests.get(single_product_url)
product_soup = BeautifulSoup(product_page.content, "html.parser")
product_form = product_soup.find("form", {"class": "variations_form cart"})
variations_json = json.loads(product_form["data-product_variations"])
attributes = item["attributes"]
variant_title = attributes["attribute_pa_flavor"]
print(variant_title)
the output is: "%d7%a1%d7%99%d7%92%d7%a8-%d7%a2%d7%95%d7%a3-100-%d7%92%d7%a8%d7%9d"
The JSON I get has all variant information such as 'is_in_stock', prices, and discounts for each variant.
I don't need only variant titles - I need the whole variant data.
How do I convert "%d7%a1%d7%99%d7%92%d7%a8-%d7%a2%d7%95%d7%a3-100-%d7%92%d7%a8%d7%9d"
to a normal string?
I tried encoding and decoding - no success.
Thanks!
You can do with urllib
, I used python.3x
In [9]: import urllib
In [10]: urllib.parse.unquote(
...: "%d7%a1%d7%99%d7%92%d7%a8-%d7%a2%d7%95%d7%a3-100-%d7%92%d7%a8%d7%9d"
...: )
Out[10]: 'סיגר-עוף-100-גרם'