i have dataframe of 8000*1600 , and i want to reduce the number of lines without changing the values, i tried pca but the values has changed exemple :
a 10 20 30 40
b 20 70 40 50
c 10 00 80 40
d 20 30 99 50
e 10 20 30 40
f 59 30 40 50
g 10 20 30 40
h 90 30 40 50
i 91 20 34 18
into :
a 10 20 30 40
c 10 00 80 40
h 90 30 40 50
i 91 20 34 18
i think explained_variance_ratio_ would handle this with a for loop , any help please
Unless I'm misunderstanding your problem, I think you're confusing the purpose of PCA (dimensionality reduction) with a simple dataframe manipulation to reduce the number of rows. These are very different things:
Dimensionality reduction, which you can get via PCA, would modify the values of your dataframe (that is the point), and is a useful, but not extremely straightforward method of creating/extracting new features from your data for analysis, visualizing high-dimensional data, etc. Take a look at the wikipedia pages on pca and dimensionality reduction, and see if that is indeed what you want. If that is what you want, I suggest you reformulate your question.
Reducing the number of rows is something completely different, and is very straightforward in pandas
. Based on your example, it looks like you want to extract a number of random rows, without modification, from your dataframe. This can be done by the following df.sample()
For example, on your data that you posted the following selects 4 random rows:
>>> df.sample(4)
0 1 2 3 4
0 a 10 20 30 40
2 c 10 0 80 40
7 h 90 30 40 50
5 f 59 30 40 50