Search code examples
pythonpandasnumpymeanpca

how to remove e scientific notation from mean method of numpy lib in python


I'm new to python and numpy library.I'm doing PCA on my custom dataset. I calculate the mean of each row of my dataframe from pandas but I get below result as mean array:

[   7.433148e+46
    7.433148e+47
    7.433148e+47
    7.433148e+46
    7.433148e+46
    7.433148e+46
    7.433148e+46
    7.433148e+45
    7.433148e+47]

And my code is :

   np.set_printoptions(precision=6)
   np.set_printoptions(suppress=False)
   df['mean']=df.mean(axis=1)
   mean_vector = np.array(df.iloc[:,15],dtype=np.float64)

  print('Mean Vector:\n', mean_vector)

what's the meaning of this numbers? and how should I remove e from the number?

Any help really appreciate, Thanks in advance.


Solution

  • Are these large numbers realistic, and, if so how do you want to display them?

    Copy and paste from your question:

    In [1]: x=np.array([7.433148e+46,7.433148e+47])
    

    The default numpy display adds a few decimal pts.

    In [2]: x
    Out[2]: array([  7.43314800e+46,   7.43314800e+47])
    

    changing precision doesn't change much

    In [5]: np.set_printoptions(precision=6)
    In [6]: np.set_printoptions(suppress=True)
    
    In [7]: x
    Out[7]: array([  7.433148e+46,   7.433148e+47])
    

    suppress does less. It supresses small floating point values, not large ones

    suppress : bool, optional
    Whether or not suppress printing of small floating point values using       
    scientific notation (default False).
    

    The default python display for one of these numbers - also scientific:

    In [8]: x[0]
    Out[8]: 7.4331480000000002e+46
    

    With a formatting command we can display it in it's 46+ character glory (or gory detail):

    In [9]: '%f'%x[0]
    Out[9]: '74331480000000001782664341808476383296708673536.000000'
    

    If that was a real value I'd prefer to see the scientific notation.

    In [11]: '%.6g'%x[0]
    Out[11]: '7.43315e+46'
    

    To illustrate what suppress does, print the inverse of this array:

    In [12]: 1/x
    Out[12]: array([ 0.,  0.])
    
    In [13]: np.set_printoptions(suppress=False)
    
    In [14]: 1/x
    Out[14]: array([  1.345325e-47,   1.345325e-48])
    

    ===============

    I'm not that familiar with pandas, but I wonder if your mean calculation makes sense. What does pandas print for df.iloc[:,15]? For the mean to be this large, the original data has to have values of similar size. How does the source display them? I wonder if most of your values are smaller, normal values, and your have a few excessively large ones (outliers) that 'distort' the mean.

    I think you can simplify the array extraction with values:

    mean_vector = np.array(df.iloc[:,15],dtype=np.float64)
    mean_vector = df.iloc[:,15].values