Search code examples
pythonstringunicode-stringsys

writing persian text into a text file in the way which could be read in python


I have developed a simple program which sends a request to a persian web server and gets the source code of the main page. Then I convert it to string , use file.open (new_file , 'w') and paste the string in it.

When i use print the string in python idle I can see the right words in persian but the text file which i made in directory is written with strings like \xd9\x8a\xd8\xb9\n.

Here is the code:

import urllib.request as ul
import sys

url = 'http://www.uut.ac.ir/'
resp = ul.urlopen(url).read()
string = str(resp)
create_file(filename , string)   # this function creates a text file in desktop

I also used:

file.open(new_file , 'w' , encoding = 'utf-8')
string = resp.encode('utf-8')

But nothing changed. Any help would be appreciated.


Solution

  • So look at your code:

    >>> resp = ul.urlopen(url).read()
    >>> type(resp)
    <class 'bytes'>
    
    1. resp has the type bytes. In the next you have used:
    string = str(resp)
    

    But you have forgot to set the encoding. The right command is:

    string = str(resp, encoding="utf-8")
    

    Now you get the right string and can write it directly to your file.

    1. Your solution 2 is false. You must use decode instead of encode.
    string = resp.decode('utf-8')