Search code examples
textacy

Textacy - Vectorizer Weighting Error


I've recently found Textacy and as i go through the API reference guide I'm running into an error for the Vectorizer. If i add any options from the API reference I get a TypeError: unexpected keyword argument. I get this error for other options in addition to weighting.

I installed textacy using pip and I'm using Python3 on Ubuntu. Any help is appreciated. Thanks!

vectorizer = textacy.vsm.Vectorizer(weighting='tfidf')

TypeError: __init__() got an unexpected keyword argument 'weighting'

Solution

  • Ran into the same problem. The API documentation does not reflect the current Vectorizer keyword arguments. The Vectorizer now provides different keyword arguments to allow more control over how TF*IDF is applied.

    vectorizer = textacy.Vectorizer(tf_type='linear', apply_idf=True, idf_type='smooth')

    tf_type applies standard term frequency (TF), apply_idf=True applies the inverse document frequency (IDF). From the repo comments, idf_type='smooth' adds one to each document frequency in order to avoid zero divisions.

    To see more information about the options check the comment at line 182 in the repository here: https://github.com/chartbeat-labs/textacy/blob/master/textacy/vsm/vectorizers.py