Search code examples
pythonnlpdoc2vec

Error: 'module' object is not callable in Doc2Vec


I am trying to fit the Doc2Vec method in a dataframe which the first column has the texts, and the second one the label (author). I have found this article https://towardsdatascience.com/multi-class-text-classification-with-doc2vec-logistic-regression-9da9947b43f4, which is really helpful. However, I am stuck at how to build a model

import tqdm
cores = multiprocessing.cpu_count()
model_dbow = Doc2Vec(dm=0, vector_size=300, negative=5, hs=0, min_count=2, sample=0, workers=cores)
model_dbow.build_vocab([x for x in tqdm(train_tagged.values)])

TypeError: 'module' object is not callable

Could you please help me how to overcome this issue?

Before that I have also this code

train, test = train_test_split(df, test_size=0.3, random_state=42)
import nltk
from nltk.corpus import stopwords
def tokenize_text(text):
    tokens = []
    for sent in nltk.sent_tokenize(text):
        for word in nltk.word_tokenize(sent):
            if len(word) < 2:
                continue
            tokens.append(word.lower())
    return tokens
train_tagged = train.apply(
    lambda r: TaggedDocument(words=tokenize_text(r['text']), tags=[r.author]), axis=1)
test_tagged = test.apply(
    lambda r: TaggedDocument(words=tokenize_text(r['text']), tags=[r.author]), axis=1)

Edit: if I remove tqdm from the code is working, but I am not sure is this is accepted. tqdm as I know is a package for Python that enables you to instantly create progress bars and estimate TTC (Time To Completion) for your functions and loops, so I mean If I remove it, there is no problem with the output. Right?

Edit2: See also this question My Doc2Vec code, after many loops of training, isn't giving good results. What might be wrong? to improve the code of the tutorial. Thanks again @gojomo


Solution

  • You are importing tqdm module and not the actual class.

    replace import tqdm

    with from tqdm import tqdm