Search code examples
pythondecodeencode

python encoding chinese to special character


I have scrape/curl request to get html from other site, that have chinese language but some text result is weird, it showing like this:

°¢Àï°Í°ÍΪÄúÌṩÁË×ÔÁµÕß¹¤³§Ö±ÏúÆ·ÅƵç×Ó±í ÖÇÄÜʱÉг±Á÷ŮʿÊÖ»·ÊÖÁ´Ê×Êαí´øµÈ²úÆ·£¬ÕâÀïÔƼ¯ÁËÖÚ¶àµÄ¹©Ó¦ÉÌ£¬²É¹ºÉÌ£¬ÖÆÔìÉÌ¡£ÓûÁ˽â¸ü¶à×ÔÁµÕß¹¤³§Ö±ÏúÆ·ÅƵç×Ó±í ÖÇÄÜʱÉг±Á÷ŮʿÊÖ»·ÊÖÁ´Ê×Êαí´øÐÅÏ¢£¬Çë·ÃÎÊ°¢Àï°Í°ÍÅú·¢Íø£¡

that should be in chinese language, and this is my code:

str(result.decode('ISO-8859-1'))

If without decode 'ISO-8859-1' (only return result variable) it will display question mark like this:

����Ͱ�Ϊ���ṩ�������߹���ֱ��Ʒ�Ƶ��ӱ� ����ʱ�г���Ůʿ�ֻ��������α����Ȳ�Ʒ�������Ƽ����ڶ�Ĺ�Ӧ�̣��ɹ��̣������̡����˽���������߹���ֱ��Ʒ�Ƶ��ӱ� ����ʱ�г���Ůʿ�ֻ��������α�����Ϣ������ʰ���Ͱ���������

Could you help me which encode/decode that I should use?

Thanks


Solution

  • It was really simple solution, as mentioned by @Thu Yein tun, to see the header response of the http request link for the content type, and I it showing as text/html;charset=GBK, then I give the solution to my code like this

    result.decode('gbk')