Search code examples
pythonpandasscikit-learnsklearn-pandas

In which order PCA components is printed? I need the parameters to solve pca formula. How do I know who the beta values are?


I'm using sklearn PCA technique. I need to solve:

pca1 =  beta1. c1 + beta2. c2 + beta3. c3 + beta4. c4 + beta5. c5

I read in the documentation that The components are sorted by explained_variance_. How do I know who the beta values are?

d = {'c1': [3, 7 ,1 ,4], 'c2': [8, 2 ,9 ,5], 'c3': [0, 7 ,9 ,2], 'c4': [3, 5 ,9 ,1], 'c5': [4, 6 ,8 ,3]}
data= pd.DataFrame(data=d)
print("data:\n",data,"\n")
x = StandardScaler().fit_transform(data)
pca = PCA(n_components=1)
principalComponents = pca.fit_transform(x)
principalDf = pd.DataFrame(data = principalComponents, columns = ['principal 
component 1'])
print("\ncomponents: \n",pca.components_,"\n")
print("\nexplained_variance_\n",pca.explained_variance_,"\n")

Result:

data:

+--+----+----+----+-----+----+
|  | c1 | c2 | c3 |  c4 | c5 |
|0 |  3 |  8 |  0 |  3  | 4  |
|1 |  7 |  2 |  7 |  5  | 6  |
|2 |  1 |  9 |  9 |  9  | 8  |
|3 |  4 |  5 |  2 |  1  | 3  |
+--+----+----+----+-----+----+

components:

[[-0.32703417  0.29320425  0.45731291  0.55565347  0.53776765]] 

explained_variance_:

[ 3.10207373] 

Solution

  • beta are components!

    beta1 = -0.32703417

    beta2 = 0.29320425

    beta3 = 0.45731291

    beta4 = 0.55565347

    beta5 = 0.53776765