I am running python 3 and am implementing a K Means cluster file. I have wrote the functions for a Euclidean distance, and assigning data. Now I want to update the assignment and write a function that that returns a new dictionary whose values are the centroids key names and value a list of points that belong to the centroid.
def update_assignment(data, centroids):
I know I need to reuse the assign_data function I created earlier. I want to do this without using numpy, but am totally stuck. Looking for suggestions. Do I need to iterate through the data again and have an if
statement that compares the previous distance? It seems like I would not need to call the previous distances since I have already created a function for it. Any help would be much appreciated.
Yes, you need to iterate through the data. Initialize a dictionary where each centroid is mapped to an empty list. then for each data point x
you can use a list comprehension to find the distances to each centroid, something like:
[euclidean_distance(x, c) for c in centroids]
The index of the smallest element in this list identifies the new centroid. Then you can append x
to the corresponding list in that dictionary.