Search code examples
pythonnumpymnistcentroidn-dimensional

Centroid of N-Dimension dataset


As I am new in python and in programming in general, my teacher gave me some work. Some of it is to work with the MNIST database of handwritten numbers. Each of the numbers is a vector of 728 components. The problem comes when I want to compute the centroid of each class. This is, the mean of every number in each of the 728 dimensions. If I had two dimensions, I know I should do something like

avgx=(x1+x2+x3)/3

and so on... But I don't know how to do it with 728 dimensions. What I have tried is this:

labels = np.array(load_digits().target)
numbers = np.array(load_digits().data)
centroid=[]

i=0
count=[]
value=[0]*10
while(i<1):
    j=0
    
    value[i]=0
    
    while j<len(labels):
        
        if(labels[j]==i):
             count[i]=count[i]+1
             value[i]=value[i]+numbers[j]
             
        j=j+1
    
    valud=value[i]
    centroid.append(x/count[i] for x in valud)
    
    i=i+1

But it returns <generator object <genexpr> at 0x000002ADA1818F90> instead of returning a 728 dimension vector, which would be the centroid of number 0, then number 1 and so on...

EDIT: thanks to one answer, I modified the code to this:

centroid=[]
labels = np.array(load_digits().target)
numbers = np.array(load_digits().data)
k=0
i=0
#First we need to calculate the centroid    
while(i<10):
    j=0
    x=[]
    while j<len(labels):
        if(labels[j]==i):
             x.append(numbers[j])  
        j=j+1 
    avg=np.array(x)
    centroid.append((avg.mean(axis=0)))
    i=i+1

And it works perfectly, thankyou so much


Solution

  • You are using numpy arrays so you should take advantage of all it has to offer.

    If you have an array of 10 vectors with 728 elements

    >>> import numpy as np
    >>> a = np.random.random((10,728))
    >>> a.shape
    (10, 768)
    

    Just take the mean along the first axis.

    >>> centroid = a.mean(axis=0)
    >>> centroid.shape
    (728,)
    >>>
    

    You should spend some time with the Absolute Beginners Tutorial and the Tutorials in the Numpy documentation.