Search code examples
pythonscikit-learnlogistic-regressiontext-classification

Reuse a logistic regression object for different fitted models


I have a Pipeline object that I want to fit on different combinations of training and test labels and thus using the fit objects, create different predictions. But I believe that fit using the same classifier object gets rid of previous fit objects.

An example of my code is:

text_clf = Pipeline([('vect', CountVectorizer(analyzer="word",tokenizer=None,preprocessor=None,stop_words=None,max_features=5000)),
                          ('tfidf', TfidfTransformer(use_idf=True,norm='l2',sublinear_tf=True)),
                          ('clf',LogisticRegression(solver='newton-cg',class_weight='balanced', multi_class='multinomial',fit_intercept=True),
                          )])

    print "Fitting the open multinomial BoW logistic regression model for probability models...\n"
    open_multi_logit_words = text_clf.fit(train_wordlist, train_property_labels)

    print "Fitting the open multinomial BoW logistic regression model w/ ",threshold," MAPE threshold...\n"
    open_multi_logit_threshold_words = (text_clf.copy.deepcopy()).fit(train_wordlist, train_property_labels_threshold)

However, classifier objects do not have deepcopy() methods. How can I achieve what I need without having to define:

text_clf_open_multi_logit = Pipeline([('vect', CountVectorizer(analyzer="word",tokenizer=None,preprocessor=None,stop_words=None,max_features=5000)),
                              ('tfidf', TfidfTransformer(use_idf=True,norm='l2',sublinear_tf=True)),
                              ('clf',LogisticRegression(solver='newton-cg',class_weight='balanced', multi_class='multinomial',fit_intercept=True),
                              )])

For all of my 16 classifier combinations?


Solution

  • I would try

    text_clf0=copy.deepcopy(text_clf)
    open_multi_logit_threshold_words = text_clf0.fit(train_wordlist, train_property_labels_threshold)
    

    EDIT: you can use a list

    text_clf_list=[copy.deepcopy(text_clf) for _ in range(16)]
    

    or directly

    copy.deepcopy(text_c‌​lf).fit(train_wordlis‌​t, train_property_label‌​s_threshold)