Search code examples
pythonweb-scrapingunicode

Python scraping hebrew - how to convert string "%d7%a1%d7%99%d7%92%d7%a8" to normal


So I'm scraping a website to get its' data. It's a woocomerce website and a product has multiple variations with different prices.

scraping with BeautifulSoup I'm able to get the whole product and variant information but some strings are unreadable.

specific product page: https://dogo.co.il/product/%d7%9b%d7%a0%d7%a2%d7%9f-%d7%97%d7%98%d7%99%d7%a4%d7%99%d7%9d-%d7%9c%d7%9b%d7%9c%d7%91%d7%99%d7%9d-%d7%91%d7%9e%d7%92%d7%95%d7%95%d7%9f-%d7%98%d7%a2%d7%9e%d7%99%d7%9d-60-100-%d7%92%d7%a8%d7%9d/

product_page = requests.get(single_product_url)
product_soup = BeautifulSoup(product_page.content, "html.parser")

product_form = product_soup.find("form", {"class": "variations_form cart"})
variations_json = json.loads(product_form["data-product_variations"])
attributes = item["attributes"]
variant_title = attributes["attribute_pa_flavor"]
print(variant_title)

the output is: "%d7%a1%d7%99%d7%92%d7%a8-%d7%a2%d7%95%d7%a3-100-%d7%92%d7%a8%d7%9d"

The JSON I get has all variant information such as 'is_in_stock', prices, and discounts for each variant.

I don't need only variant titles - I need the whole variant data.

How do I convert "%d7%a1%d7%99%d7%92%d7%a8-%d7%a2%d7%95%d7%a3-100-%d7%92%d7%a8%d7%9d" to a normal string?

I tried encoding and decoding - no success.

Thanks!


Solution

  • You can do with urllib, I used python.3x

    In [9]: import urllib
    
    In [10]: urllib.parse.unquote(
        ...:     "%d7%a1%d7%99%d7%92%d7%a8-%d7%a2%d7%95%d7%a3-100-%d7%92%d7%a8%d7%9d"
        ...: )
    Out[10]: 'סיגר-עוף-100-גרם'