Search code examples
pythontextnlpdata-science

Affinity propagation did not converge, this model will not have any cluster centers


When I try to cluster using affinity propagation, the below error occurs and the number of clusters is one.

"...\anaconda\lib\site-packages\sklearn\cluster\_affinity_propagation.py:246: ConvergenceWarning: Affinity propagation did not converge, this model will not have any cluster centers.
  warnings.warn("Affinity propagation did not converge, this model ""

Below is the code I tried.

def build_feature_matrix(documents, feature_type='frequency',
                         ngram_range=(1, 1), min_df=0.0, max_df=1.0):

    feature_type = feature_type.lower().strip()  
    
    if feature_type == 'binary':
        vectorizer = CountVectorizer(binary=True, min_df=min_df,
                                     max_df=max_df, ngram_range=ngram_range)
    elif feature_type == 'frequency':
        vectorizer = CountVectorizer(binary=False, min_df=min_df,
                                     max_df=max_df, ngram_range=ngram_range)
    elif feature_type == 'tfidf':
        vectorizer = TfidfVectorizer(min_df=min_df, max_df=max_df, 
                                     ngram_range=ngram_range)
    else:
        raise Exception("Wrong feature type entered. Possible values: 'binary', 'frequency', 'tfidf'")

    feature_matrix = vectorizer.fit_transform(documents).astype(float)
    
    return vectorizer, feature_matrix

vectorizer, feature_matrix = build_feature_matrix(filtered_list_6,
                                                  feature_type='tfidf',
                                                  min_df=0.15, max_df=0.85,
                                                  ngram_range=(1, 2))

def affinity_propagation(feature_matrix):
    
    sim = feature_matrix * feature_matrix.T
    sim = sim.todense()
    ap = AffinityPropagation()
    ap.fit(sim)
    clusters = ap.labels_          
    return ap, clusters

ap_obj, clusters = affinity_propagation(feature_matrix=feature_matrix)
df[len(df.columns)] = clusters

c = Counter(clusters)   
print(c.items())

total_clusters = len(c)
print('Total Clusters:', total_clusters)

Could someone point what I am doing wrong here?

Thanks in advance!


Solution

  • I could change the damping value, max_iter and preference values to eliminate the issue. Initially you can start with damping = 0.9, max_iter = 1000.

    You can change the preference value as needed and this will change the number of clusters generated by the model