I'm trying to plot points with both colors and labels. This is not a classical problem: in fact, typically python users set "labels" as categories. In this case I want that the color represents a feature, while the label is an identifier for the point itself. It follows a toy-example:
x = [-0.01611772, 1.51755901, -0.64869352, -1.80850313, -0.11505037]
y = [ 0.04845168, -0.45576903, 0.62703651, -0.24415787, -0.41307092]
colors = ['b', 'g', 'r', 'b', 'r']
labels = ['Gioele', 'Felix', 'Elpi', 'Roro', 'Cacara']
I'd like to use the function scatter. Following the "quick" documentation:
def scatter(x, y, s=20, c=None, marker='o', cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, edgecolors=None, hold=None, data=None, **kwargs) Inferred type: (x: Any, y: Any, s: int, c: Any, marker: unicode, cmap: Any, norm: Any, vmin: Any, vmax: Any, alpha: Any, linewidths: Any, verts: Any, edgecolors: Any, hold: Any, data: Any, kwargs: dict) -> Any
So, my try was:
import pylab
pylab.scatter(x, y, c=colors, data=labels)
pylab.show()
but it seems ignoring the data=labels
part.
In addition: suppose we can plot the labels, is there a way to plot them in a "smart" way, i.e. such that the labels don't hide each other? I would need something similar to the R function ggrepel
.
I think using plt.annotate
is an option here. To take your example:
import matplotlib.pyplot as plt
x = [-0.01611772, 1.51755901, -0.64869352, -1.80850313, -0.11505037]
y = [ 0.04845168, -0.45576903, 0.62703651, -0.24415787, -0.41307092]
colors = ['b', 'g', 'r', 'b', 'r']
labels = ['Gioele', 'Felix', 'Elpi', 'Roro', 'Cacara']
plt.scatter(x,y,c=colors)
for label,xi,yi in zip(labels,x,y):
plt.annotate(label,xy=(xi,yi),textcoords='offset points',
ha='left',va='bottom')
This gives the following output:
Edit: I just spotted that you also asked about overlapping labels, too. This question seems to have a good solution. There is also apparently a piece of code on github that is designed to emulate ggrepel
.