Motivation for k-medoids

Why would one use kmedoids algoirthm rather then kmeans? Is it only the fact that the number of metrics that can be used in kmeans is very limited or is there something more?

Is there an example of data, for which it makes much more sense to choose the best representatives of cluster from the data rather then from R^n?

Solution

Why would we use k-medoids instead of k-means in case of (squared) Euclidean distance?

1. Technical justification

In case of relatively small data sets (as k-medoids complexity is greater) - to obtain a clustering more robust to noise and outliers.

Example 2D data showing that:

The graph on the left shows clusters obtained with K-medoids (sklearn_extra.cluster.KMedoids method in Python with default options) and the one on the right with K-means for K=2. Blue crosses are cluster centers.

The Python code used to generate green points:

import numpy as np
import matplotlib.pyplot as plt

rng = np.random.default_rng(seed=32)
a = rng.random((6,2))*2.35 - 3*np.ones((6,2))
b = rng.random((50,2))*0.25 - 2*np.ones((50,2))
c = rng.random((100,2))*0.5 - 1.5*np.ones((100,2))
d = rng.random((7,2))*0.55

points = np.concatenate((a, b, c, d))
plt.plot(points[:,0],points[:,1],"g.", markersize=8, alpha=0.3) # green points

2. Business case justification

Here are some example business cases showing why we would prefer k-medoids. They mostly come down to the interpretability of the results and the fact that in k-medoids the resulting cluster centers are members of the original dataset.

2.1 We have a recommender engine based only on user-item preference data and want to recommend to the user those items (e.g. movies) that other similar people enjoyed. So we assign the user to his/her closest cluster and recommend top movies that the cluster representant (actual person) watched. If the cluster representant wasn't an actual person we wouldn't possess the history of actually watched movies to recommend. Each time we'd have to search additionally e.g. for the closest person from the cluster. Example data: classic MovieLens 1M Dataset

2.2 We have a database of patients and want to pick a small representative group of size K to test a new drug with them. After clustering the patients with K-medoids, cluster representants are invited to the drug trial.