Search code examples
pythonvisualizationk-means

How do I assign colors to clusters in kmeans?


I keep getting error messages for my kmeans clustering. Note: I am extremely new to everything and coding in general, so I am also looking to improve in any way. I tried personally defining each color, but that did not work either.

# create map
map_clusters = folium.Map(location=[latitude, longitude], zoom_start=11)

# set color scheme for the clusters
x = np.arange(kclusters)
ys = [i + x + (i*x)**2 for i in range(kclusters)]
colors_array = cm.rainbow(np.linspace(0, 1, len(ys)))
rainbow = [colors.rgb2hex(i) for i in colors_array]

# add markers to the map
markers_colors = []
for lat, lon, poi, cluster in zip(pittsburgh_merged['Latitude'], 
pittsburgh_merged['Longitude'], pittsburgh_merged['Neighborhood'], 
pittsburgh_merged['Cluster Labels']):
label = folium.Popup(str(poi) + ' Cluster ' + str(cluster), 
parse_html=True)
folium.CircleMarker(
[lat, lon],
radius=5,
popup=label,
color=rainbow[cluster-1],
fill=True,
fill_color=rainbow[cluster-1],
fill_opacity=0.7).add_to(map_clusters)

map_clusters

I expect an output of the map_clusters to be visible. It is supposed to be a map of Pittsburgh with venues organized by colors. Hence the rainbow assignment. However, I keep getting the "TypeError: list indices must be integers or slices, not float" error for the color and fill_color assignments.


Solution

  • The error you are receiving means that the index that you are using to access the list rainbow is not an integer, but a float. In this case, you are trying to access element cluster - 1 of the list rainbow. However, the expression cluster - 1 seems to be a float, which in turn implies that the variable cluster does not contain an int, but a float. Try to make sure that you are passing in integer, for example by casting the variable to an integer:

    color = rainbow[int(cluster)-1]

    However, this depends on the actual content of the variable and will not work if cluster contains a nan-value like inf. In this case (or all cases, actually), you should take a look at the data you have and make sure that is makes sense. Since you are trying to do k-means and receive float values and even nan-values for your cluster labels, it is possible that something went wrong earlier during the clustering process. Try looking at the actual content of the pittsburgh_merged variable by printing its content.