I am attempting to run my own scrapy project. I thought I resolved a related issue in a thread I posted here:[urlparse: ModuleNotFoundError, presumably in Python2.7 and under conda
I did a complete system image restore and simply installed Python 2.7 and Miniconda. However, Atom Editor is still flagging/underlining 'import urlparse'.
The code is based off a well written book and the author provides a great VM playground to run scripts exampled in the book. In the VM the code works fine.
However, in an attempt to practice on my own, I now receive the following error:
(p2env) C:\Users\User-1\Desktop\scrapy_projects\dictionary>scrapy crawl basic
Traceback (most recent call last):
File "C:\Users\User-1\Miniconda2\envs\p2env\Scripts\scrapy-script.py", line 5, in <module>
File "C:\Users\User-1\Miniconda2\envs\p2env\lib\site-packages\scrapy\cmdline.py", line 148, in execute cmd.crawler_process = CrawlerProcess(settings)
File "C:\Users\User-1\Miniconda2\envs\p2env\lib\site-packages\scrapy\crawler.py", line 243, in __init__
super(CrawlerProcess, self).__init__(settings)
File "C:\Users\User-1\Miniconda2\envs\p2env\lib\site-packages\scrapy\crawler.py", line 134, in __init__
self.spider_loader = _get_spider_loader(settings)
File "C:\Users\User-1\Miniconda2\envs\p2env\lib\site-packages\scrapy\crawler.py", line 330, in _get_spider_loader
return loader_cls.from_settings(settings.frozencopy())
File "C:\Users\User-1\Miniconda2\envs\p2env\lib\site-packages\scrapy\spiderloader.py", line 61, in from_settings
return cls(settings)
File "C:\Users\User-1\Miniconda2\envs\p2env\lib\site-packages\scrapy\spiderloader.py", line 25, in __init__
File "C:\Users\User-1\Miniconda2\envs\p2env\lib\site-packages\scrapy\spiderloader.py", line 47, in _load_all_spiders
for module in walk_modules(name):
File "C:\Users\User-1\Miniconda2\envs\p2env\lib\site-packages\scrapy\utils\misc.py", line 71, in walk_modules
submod = import_module(fullpath)
File "C:\Users\User-1\Miniconda2\envs\p2env\lib\importlib\__init__.py", line 37, in import_module
File "C:\Users\User-1\Desktop\scrapy_projects\dictionary\dictionary\spiders\basic.py", line 11, in <module>
from terms.items import TermsItem
ImportError: No module named terms.items
My folder hierarchy is as follows:
│ scrapy.cfg
│ items.py
│ middlewares.py
│ pipelines.py
│ settings.py
│ settings.pyc
│ __init__.py
│ __init__.pyc
My items.py code is as follows:
# -*- coding: utf-8 -*-
from scrapy.item import Item, Field
class TermsItem(Item):
# Primary fields
title = Field()
definition = Field()
# Housekeeping fields
url = Field()
project = Field()
spider = Field()
server = Field()
date = Field()
My spider.py is as follows:
# -*- coding: utf-8 -*-
import datetime
import urlparse
import socket
import scrapy
from scrapy.loader.processors import MapCompose, Join
from scrapy.loader import ItemLoader
from terms.items import TermsItem
class BasicSpider(scrapy.Spider):
name = "basic"
allowed_domains = ["web"]
# Start on a property page
start_urls = (
def parse(self, response):
# Create the loader using the response
l = ItemLoader(item=TermsItem(), response=response)
# Load fields using XPath expressions
l.add_xpath('title', '//h1[@class="head-entry"][1] / text()',
MapCompose(unicode.strip, unicode.title))
l.add_xpath('definition', '//*[@class="def-list"][1]/text()',
MapCompose(unicode.strip, unicode.title))
# Housekeeping fields
l.add_value('url', response.url)
l.add_value('project', self.settings.get('BOT_NAME'))
l.add_value('spider', self.name)
l.add_value('server', socket.gethostname())
l.add_value('date', datetime.datetime.now())
return l.load_item()
Based on this stackoverflow question: Scrapy ImportError: No module named Item where coders are instructed to '**execute the Scrapy command from inside the top level directory of your project. – alecxe'** has me wondering if the conda environment I am using is causing the error? No module named items stack question has a similar point '**What is doing the import? And what is the working directory/the contents of sys.path. You can't find Project_L if the parent directory isn't the working directory and doesn't appear in sys.path. – ShadowRanger May 11 at 22:24'** However, to the best of my knowledge I am structuring the project correctly and the corresponding hierarchy is correct.
Any help would be greatly appreciated. Apologies for the lengthy post, I just wanted to be as comprehensive as possible and make sure that people appreciate the difference between this question and the similar ones I have linked to.
The backtrace indicates the problem is in a specific import. You can verify this from the command line. For example on my machine I get
$ python -c 'from terms.items import TermsItem'
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: No module named terms.items
Looking at your folder hierarchy I see no called "terms" module so that's probably what you're missing but since you indicate the code is working in the author's VM what I would do is try running the following command in that VM:
$ python -v -c 'from terms.items import TermsItem'
The -v
option will cause python to show you all the paths being imported.
$ python -v -c 'from terms.items import TermsItem'
# installing zipimport hook
import zipimport # builtin
# installed zipimport hook
# /usr/local/var/pyenv/versions/2.7.12/lib/python2.7/site.pyc matches /usr/local/var/pyenv/versions/2.7.12/lib/python2.7/site.py
import site # precompiled from /usr/local/var/pyenv/versions/2.7.12/lib/python2.7/site.pyc
import encodings.ascii # precompiled from /usr/local/var/pyenv/versions/2.7.12/lib/python2.7/encodings/ascii.pyc
Python 2.7.12 (default, Nov 29 2016, 14:57:54)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
Traceback (most recent call last):
File "<string>", line 1, in <module>
ImportError: No module named terms.items
# clear __builtin__._
# clear sys.path
# cleanup ints: 20 unfreed ints
# cleanup floats
If you do that where the code is working then somewhere in that output will be a successful import. From that you may be able to work out the name of the missing module on your system and install it accordingly.
EDIT: looking closer at your post I notice you mention your "items.py" contains
class TermsItem(Item):
# Primary fields
so I suspect your problem is that your import should be
from items import TermsItem