Search code examples
pythonpython-3.xencodinglxml.html

lxml in Python: Scraping shows only English characters (others are garbled)


Here is my code:

import requests
from lxml.etree import HTML
title_req = requests.get("https://www.youtube.com/watch?v=VK3QWm7jvZs")
title_main = HTML(title_req.content)
title = title_main.xpath("//span[@id='eow-title']/@title")[0]
print(title)
>> Halsey - Without Me - Ù\x85ترجÙ\x85Ø© عربÙ\x8a

I want it to be like this:

>> Halsey - Without Me - مترجمة عربي

I tried to add UTF-8 encoding but its not working

Thanks.


Solution

  • I don't know why but this line is creating problem.

    title_main = HTML(title_req.content)
    

    change it to

    title_main = HTML(title_req.text)
    

    I will try to know why.