Search code examples
pythonencodingscrapylocale

How to make scrapy output info show the same cjk appearance in debian as in windows?


import scrapy
from info.items import InfoItem

class InfoSpider(scrapy.Spider):
    name = 'info'
    allowed_domains = ['quotes.money.163.com']
    start_urls = [ r"http://quotes.money.163.com/f10/gszl_600023.html"]

    def parse(self, response):
        item = StockinfoItem()
        item["content"] = response.xpath("/html/body/div[2]/div[4]/table/tr[2]/td[2]").extract()[0]
        yield item

For the above spider, execute on my windows (win7) english version,the scrapy info shows:

2019-04-27 23:27:41 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: info)
2019-04-27 23:27:41 [scrapy.utils.log] INFO: Versions: lxml 4.3.2.0, libxml2 2.9.5, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 18.9.0, Python 3.7.2 (tags/v3.7.2:9a3ffc0492, Dec 23 2018, 23:09:28) [MSC v.1916 64 bit (AMD64)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1b  26 Feb 2019), cryptography 2.6.1, Platform Windows-7-6.1.7601-SP1
2019-04-27 23:27:41 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'info', 'FEED_EXPORT_FIELDS': ['content'], 'NEWSPIDER_MODULE': 'info.spiders', 'SPIDER_MODULES': ['info.spiders']}
2019-04-27 23:27:53 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.money.163.com/f10/gszl_600023.html> (referer: None)
2019-04-27 23:27:53 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'content': ['浙能电力']}

Now execute on my linux os(debian 9),the scrapy info shows:

2019-04-28 07:12:00 [scrapy.utils.log] INFO: Scrapy 1.6.0 started (bot: info)
2019-04-28 07:12:00 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.9, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 16.6.0, Python 2.7.13 (default, Sep 26 2018, 18:42:22) - [GCC 6.3.0 20170516], pyOpenSSL 16.2.0 (OpenSSL 1.1.0j  20 Nov 2018), cryptography 1.7.1, Platform Linux-4.9.0-8-amd64-x86_64-with-debian-9.8
2019-04-28 07:12:00 [scrapy.crawler] INFO: Overridden settings: {'NEWSPIDER_MODULE': 'info.spiders', 'SPIDER_MODULES': ['info.spiders'], 'FEED_EXPORT_ENCODING': 'utf-8', 'BOT_NAME': 'info'}
2019-04-28 07:12:01 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.money.163.com/f10/gszl_600023.html> (referer: None)
2019-04-28 07:12:01 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.money.163.com/f10/gszl_600023.html>
{'content': u'\u6d59\u80fd\u7535\u529b'}

The locale info in my debian 9.

locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

How can i make scrapy output info show the same (in same cjk appearance)in my debian as in my windows?

There are two python version in my debian.

/usr/local/lib/python3.5/dist-packages/pip
/usr/local/lib/python2.7/dist-packages/pip

My scrapy1.6 was build on python2.7,maybe to reinstall scrapy on python3 can solve the issue.


Solution

  • The tool stack info on my debian shows that

    2019-04-28 07:12:00 [scrapy.utils.log] INFO: Versions: lxml 4.3.3.0, libxml2 2.9.9, cssselect 1.0.3, parsel 1.5.1, w3lib 1.20.0, Twisted 16.6.0, Python 2.7.13 (default, Sep 26 2018, 18:42:22) 
    

    THere are two python version in my debian:python2.7 and python3.5,scrapy was installed in python2.7,uninstall scrapy in python2.7 and reinstall it in python3.5.

    sudo pip uninstall Scrapy
    sudo pip3 install Scrapy
    

    To run the spider ,it shows same cjk appearance in debian as in win7 now.