I'm looking at the attributes
of skbio's
PCoA
method (listed below). I am new to this API
and I want to be able to get the eigenvectors
and the original points projected onto the new axis similar to .fit_transform
in sklearn.decomposition.PCA
so I can create some PC_1 vs PC_2
-style plots. I figured out how to get the eigvals
and proportion_explained
but features
comes back as None
.
Is that because it's in beta?
If there are any tutorials that use this, that would be greatly appreciated. I am a huge fan of scikit-learn
and would like to start using more of scikit's
products.
| Attributes
| ----------
| short_method_name : str
| Abbreviated ordination method name.
| long_method_name : str
| Ordination method name.
| eigvals : pd.Series
| The resulting eigenvalues. The index corresponds to the ordination
| axis labels
| samples : pd.DataFrame
| The position of the samples in the ordination space, row-indexed by the
| sample id.
| features : pd.DataFrame
| The position of the features in the ordination space, row-indexed by
| the feature id.
| biplot_scores : pd.DataFrame
| Correlation coefficients of the samples with respect to the features.
| sample_constraints : pd.DataFrame
| Site constraints (linear combinations of constraining variables):
| coordinates of the sites in the space of the explanatory variables X.
| These are the fitted site scores
| proportion_explained : pd.Series
| Proportion explained by each of the dimensions in the ordination space.
| The index corresponds to the ordination axis labels
Here is my code to generate the principal component analysis
object.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
from sklearn.preprocessing import StandardScaler
from sklearn import decomposition
import seaborn as sns; sns.set_style("whitegrid", {'axes.grid' : False})
import skbio
from scipy.spatial import distance
%matplotlib inline
np.random.seed(0)
# Iris dataset
DF_data = pd.DataFrame(load_iris().data,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
columns = load_iris().feature_names)
n,m = DF_data.shape
# print(n,m)
# 150 4
Se_targets = pd.Series(load_iris().target,
index = ["iris_%d" % i for i in range(load_iris().data.shape[0])],
name = "Species")
# Scaling mean = 0, var = 1
DF_standard = pd.DataFrame(StandardScaler().fit_transform(DF_data),
index = DF_data.index,
columns = DF_data.columns)
# Distance Matrix
Ar_dist = distance.squareform(distance.pdist(DF_standard.T, metric="braycurtis")) # (m x m) distance measure
DM_dist = skbio.stats.distance.DistanceMatrix(Ar_dist, ids=DF_standard.columns)
PCoA = skbio.stats.ordination.pcoa(DM_dist)
You can access the transformed sample coordinates with OrdinationResults.samples
. This will return a pandas.DataFrame
row-indexed by sample ID (i.e. the IDs in your distance matrix). Since principal coordinate analysis operates on a distance matrix of samples, transformed feature coordinates (OrdinationResults.features
) are not available. Other ordination methods in scikit-bio accepting a sample x feature table as input will have the transformed feature coordinates available (e.g. CA, CCA, RDA).
Side note: the distance.squareform
call is unnecessary because skbio.DistanceMatrix
supports square- or vector-form arrays.