python scikit-learn cluster-analysis unsupervised-learning

Results from transformation with Feature Agglomeration

I am using the transform function from the feature agglomeration object from scikit-learn on a matrix with data. After running the transform function, on the resulting matrix (X_reduced in the code), is the first element the result of the agglomeration of cluster 0, the second element for cluster 1, and so on? Or is it random?

from sklearn import Cluster
agglo = cluster.FeatureAgglomeration(n_clusters=100)
agglo.fit(X_train_prepared)
X_reduced = agglo.transform(X_train_prepared)

Solution

Relative to labeling you get from agglo.labels_ -- yes, first column of resulting matrix is cluster zero.

I.e. the following holds:

from sklearn import cluster
from sklearn.datasets import load_iris, make_blobs
import pandas as pd
agglo = cluster.FeatureAgglomeration(n_clusters=95)
X,y = make_blobs(n_features=100)
agglo.fit(X)
X_reduced = agglo.transform(X)
# first column in reduced is the mean of all columns that lie in the first cluster
>>> all(X_reduced[:,0] == X[:,(agglo.labels_ == 0).nonzero()[0]].mean(axis=1))
True