I'm performing a PCA using Scikitlearn in Python3.
But, after I run my code, the principal component of the last row has an "off" value. I know for a fact that the last row is correct.
I plotted three PCA's to visualize the problem. The first plot (the full dataset) you can see the "sample" plots as predicted, but, in the second and third plot, if I remove populations (a part of the full dataset) the sample plots "weird".
The dataframe with computed principal components (see last row):
principal_component_1 principal_component_2 Sample_name Population
0 3.279363 -0.288892 HG02291 American_Ancestry
1 3.625035 -0.296081 HG02275 American_Ancestry
2 3.870248 -0.264558 HG02272 American_Ancestry
3 3.118460 -0.272594 HG02271 American_Ancestry
4 2.811992 -0.376418 HG02259 American_Ancestry
... ... ... ... ...
1590 1.849372 -0.167314 HGDP00555 Oceanian_Ancestry
1591 1.666233 -0.224749 HGDP00556 Oceanian_Ancestry
1592 1.983947 -0.202254 HGDP00552 Oceanian_Ancestry
1593 2.202948 -0.210858 HGDP00554 Oceanian_Ancestry
1594 -4.693172 126.672265 Sample Sample
The code that I use:
def do_pca(pca_data, sample_name, pops):
"""
This function plots the PCA data from the sample and dataset in a PCA plot
"""
# initiliaze variabeles for the PCA plot
pops = pops + ["Sample"]
pca_df = pd.read_csv(pca_data, sep=";")
pca_df = pca_df[pca_df["Population"].isin(pops)].reset_index()
features = list(pca_df.columns.values)
features.remove("Population")
features.remove("Sample_name")
x = pca_df.loc[:, features].values # Separating out the features
y = pca_df.loc[:, ["Population", "Sample_name"]] # Separating out the target
x = StandardScaler().fit_transform(x) # Standardizing the features
# initiliaze PCA plot
dot_size = 20
pca = PCA(n_components=2)
pc = pca.fit_transform(x)
pc_df = pd.DataFrame(data=pc, columns=["principal_component_%s" % (x + 1) for x in range(2)])
pc_df["Sample_name"] = y["Sample_name"]
pc_df["Population"] = y["Population"]
return pc_df
Can someone explain to me what I do wrong? Is my code off?
I found a similar question on StackOverflow, but it doesn't have an answer: link
try turning it off and on again :/