Search code examples
pythonscikit-learncluster-analysisunsupervised-learning

Results from transformation with Feature Agglomeration


I am using the transform function from the feature agglomeration object from scikit-learn on a matrix with data. After running the transform function, on the resulting matrix (X_reduced in the code), is the first element the result of the agglomeration of cluster 0, the second element for cluster 1, and so on? Or is it random?

from sklearn import Cluster
agglo = cluster.FeatureAgglomeration(n_clusters=100)
agglo.fit(X_train_prepared)
X_reduced = agglo.transform(X_train_prepared)

Solution

  • Relative to labeling you get from agglo.labels_ -- yes, first column of resulting matrix is cluster zero.

    I.e. the following holds:

    from sklearn import cluster
    from sklearn.datasets import load_iris, make_blobs
    import pandas as pd
    agglo = cluster.FeatureAgglomeration(n_clusters=95)
    X,y = make_blobs(n_features=100)
    agglo.fit(X)
    X_reduced = agglo.transform(X)
    # first column in reduced is the mean of all columns that lie in the first cluster
    >>> all(X_reduced[:,0] == X[:,(agglo.labels_ == 0).nonzero()[0]].mean(axis=1))
    True