Search code examples
pythoncluster-analysiscategorical-datamulti-dimensional-scaling

Perform Multi-Dimension Scaling (MDS) for clustered categorical data in python


I am currently working on clustering categorical attributes that come from a bank marketing dataset from Kaggle. I have created the three clusters with kmodes:

Output: cluster_df

Now I want to visualize each row of a cluster as a projection or point so that I get some kind of image:

Desired visualization

I am having a hard time with this. I don't get a Euclidean distance with categorized data, right? That makes no sense. Is there then no possibility to create this desired visualization?


Solution

  • The best way to visualize clusters is to use PCA. You can use PCA to reduce the multi-dimensional data into 2 dimensions so that you can plot and hopefully understand the data better. To use it see the following code:

    from sklearn.decomposition import PCA
    pca = PCA(n_components=2)
    principalComponents = pca.fit_transform(x)
    principalDf = pd.DataFrame(data = principalComponents
                 , columns = ['principal component 1', 'principal component 2'])
    

    where x is the fitted and transformed data on your cluster. Now u can easily visualize your clustered data since it's 2 dimensional.