I'm new to python and numpy library.I'm doing PCA on my custom dataset. I calculate the mean of each row of my dataframe from pandas but I get below result as mean array:
[ 7.433148e+46
7.433148e+47
7.433148e+47
7.433148e+46
7.433148e+46
7.433148e+46
7.433148e+46
7.433148e+45
7.433148e+47]
And my code is :
np.set_printoptions(precision=6)
np.set_printoptions(suppress=False)
df['mean']=df.mean(axis=1)
mean_vector = np.array(df.iloc[:,15],dtype=np.float64)
print('Mean Vector:\n', mean_vector)
what's the meaning of this numbers? and how should I remove e from the number?
Any help really appreciate, Thanks in advance.
Are these large numbers realistic, and, if so how do you want to display them?
Copy and paste from your question:
In [1]: x=np.array([7.433148e+46,7.433148e+47])
The default numpy display adds a few decimal pts.
In [2]: x
Out[2]: array([ 7.43314800e+46, 7.43314800e+47])
changing precision doesn't change much
In [5]: np.set_printoptions(precision=6)
In [6]: np.set_printoptions(suppress=True)
In [7]: x
Out[7]: array([ 7.433148e+46, 7.433148e+47])
suppress
does less. It supresses small floating point values, not large ones
suppress : bool, optional
Whether or not suppress printing of small floating point values using
scientific notation (default False).
The default python display for one of these numbers - also scientific:
In [8]: x[0]
Out[8]: 7.4331480000000002e+46
With a formatting command we can display it in it's 46+ character glory (or gory detail):
In [9]: '%f'%x[0]
Out[9]: '74331480000000001782664341808476383296708673536.000000'
If that was a real value I'd prefer to see the scientific notation.
In [11]: '%.6g'%x[0]
Out[11]: '7.43315e+46'
To illustrate what suppress
does, print the inverse of this array:
In [12]: 1/x
Out[12]: array([ 0., 0.])
In [13]: np.set_printoptions(suppress=False)
In [14]: 1/x
Out[14]: array([ 1.345325e-47, 1.345325e-48])
===============
I'm not that familiar with pandas
, but I wonder if your mean
calculation makes sense. What does pandas
print for df.iloc[:,15]
? For the mean to be this large, the original data has to have values of similar size. How does the source display them? I wonder if most of your values are smaller, normal values, and your have a few excessively large ones (outliers) that 'distort' the mean.
I think you can simplify the array extraction with values
:
mean_vector = np.array(df.iloc[:,15],dtype=np.float64)
mean_vector = df.iloc[:,15].values