Search code examples
pythonnlpnltkcorpus

How to save NLTK concordance results in a list?


I'm using the NLTK to find word in a text. I need to save result of concordance function into a list. The question is already asked here but i cannot see the changes. I try to find the type of returnde value of the function by :

type(text.concordance('myword'))

the result was :

<class 'NoneType'>

Solution

  • By inspecting the source of ConcordanceIndex, we can see that results are printed to stdout. If redirecting stdout to a file is not an option, you have to reimplement the ConcordanceIndex.print_concordance such that it returns the results rather than printing it to stdout.

    Code:

    def concordance(ci, word, width=75, lines=25):
        """
        Rewrite of nltk.text.ConcordanceIndex.print_concordance that returns results
        instead of printing them. 
    
        See:
        http://www.nltk.org/api/nltk.html#nltk.text.ConcordanceIndex.print_concordance
        """
        half_width = (width - len(word) - 2) // 2
        context = width // 4 # approx number of words of context
    
        results = []
        offsets = ci.offsets(word)
        if offsets:
            lines = min(lines, len(offsets))
            for i in offsets:
                if lines <= 0:
                    break
                left = (' ' * half_width +
                        ' '.join(ci._tokens[i-context:i]))
                right = ' '.join(ci._tokens[i+1:i+context])
                left = left[-half_width:]
                right = right[:half_width]
                results.append('%s %s %s' % (left, ci._tokens[i], right))
                lines -= 1
    
        return results
    

    Usage:

    from nltk.book import text1
    from  nltk.text import ConcordanceIndex
    
    ci = ConcordanceIndex(text1.tokens)
    results = concordance(ci, 'circumstances')
    
    print(type(results))
    <class 'list'>