Given a list of pairs of unidimensional coordinates (or segments) like the next:
[1]: 1200, 1210
[2]: 1212, 1222
[3]: 1190, 1200
[4]: 300, 310
...
[n]: 800, 810
(where you can take the center of each pair to represent each element) I want to know what algorithm or what kind of algorithm can I use in order to find "hotspots" or clusters.
A hotspot is a segment containing certain amount of items in it (let's say k).
For example [3], [1] and [2] would belong to the same group and the resulting list would be something like:
[1']: 1190, 1222 ([1], [2], [3])
(begin, end, contained elements)
The problem is not really well-defined, but maybe this will help you.
KMeans is a way of clustering items by distance. Scikit-learn has an implementation, that is quite easy to use. See http://scikit-learn.org/stable/auto_examples/cluster/plot_kmeans_digits.html#example-cluster-plot-kmeans-digits-py for an example.
This will allow you to define the numner of clusters you want to find. You cannot however know how many points will end up in each cluster. Anyway, here's a small example:
from sklearn.cluster import KMeans
data = [[1200, 1210], [1212, 1222], [1190, 1200], [300, 310], [800, 810]]
centers = [[sum(x) / len(x)] for x in data]
clf = KMeans(n_clusters=3)
clf.fit(centers)
for points in data:
center = sum(points) / len(points)
print points, center, clf.predict([center])
Output:
[1200, 1210] 1205 [1]
[1212, 1222] 1217 [1]
[1190, 1200] 1195 [1]
[300, 310] 305 [0]
[800, 810] 805 [2]
EDIT: Another algorithm provided in SKLearn is Affinity Propagation, that doesn't require the number of clusters to be set before hand. I don't know how this exactly works, but you should be able to find some info on that yourself.
Example:
from sklearn.cluster import AffinityPropagation
import numpy as np
data = [[1200, 1210], [1212, 1222], [1190, 1200], [300, 310], [800, 810]]
centers = np.array([[sum(x) / len(x)] for x in data])
clf = AffinityPropagation()
for (points, cluster) in zip(data, clf.fit_predict(centers)):
center = sum(points) / len(points)
print points, center, cluster
Output:
[1200, 1210] 1205 0
[1212, 1222] 1217 0
[1190, 1200] 1195 0
[300, 310] 305 1
[800, 810] 805 2