Search code examples
python-2.7machine-learningartificial-intelligencedocumentfilter

What is wrong with following piece of code?


I have the following piece of code copied from book programming collective intelligence page 118, chapter "Document Filtering". This function breaks up the text into words by dividing the text on any character that isn't a letter. This leaves only actual words,all converted to lower-case.

import re                                          
import math
def getwords(doc):
    splitter=re.compile('\\W*')
    words=[s.lower() for s in splitter.split(doc) 
           if len(s)>2 and len(s)<20]
    return dict([(w,1) for w in words])

I implemented the function and got the following error:

>>> import docclas
>>> t=docclass.getwords(s)
Traceback (most recent call last):
  File "<pyshell#15>", line 1, in <module>
    t=docclass.getwords(s)
  File "docclass.py", line 6, in getwords
    words=[s.lower() for s in splitter.split(doc)
NameError: global name 'splitter' is not defined

Solution

  • It works here

    >>> import re
    >>> 
    >>> def getwords(doc):
    ...     splitter=re.compile('\\W*')
    ...     words=[s.lower() for s in splitter.split(doc) 
    ...            if len(s)>2 and len(s)<20]
    ...     return dict([(w,1) for w in words])
    ... 
    >>> getwords ("He's fallen in the water!");
    {'water': 1, 'the': 1, 'fallen': 1}
    

    I'm gueesing you made a typo in your code, but got it right when you pasted it here.