Search code examples
pythonnltktabular

Difference between 'sample' and 'samples' keyword in python nltk ConditionalFreqDist


I am finding frequency distribution for some words in different genres of Brown corpus.

My Code :

import nltk
from nltk.corpus import brown

cfd = nltk.ConditionalFreqDist(
      (genre, word)
      for genre in brown.categories()
      for word in brown.words(categories = genre))

genres = ['news', 'religion', 'hobbies', 'science_fiction', 'romance', 'humor']
modals = ['can', 'could', 'may', 'might', 'must', 'will']

cfd.tabulate(conditions = genres, samples = modals)

Output for above code :

                 can could  may might must will 
           news   93   86   66   38   50    389 
       religion   82   59   78   12   54     71 
        hobbies  268   58  131   22   83    264 
science_fiction   16   49    4   12    8     16 
        romance   74  193   11   51   45     43 
          humor   16   30    8    8    9     13  

But when i replace 'samples' by 'sample' in the last line of above code . It gives FreqDist for every word in corpus .

I don't know the difference between 'sample' and 'samples' ?

Thank you .


Solution

  • cfd.tabulate() simply ignores any keyword argument that's not referenced in its implementation. That's why sample=models still produces a full table for the FreqDist. If you leave it out altogether, the effect should be the same.

    This behavior is not NLTK-specific, but holds for any Python function/method that accepts arbitrary argument lists. I would recommend reading the Python Tutorial section about this, I find it very clear.