I'm trying to replicate the PCA example found here but when trying to run the pca_summary() I get the following error, any thoughts much appreciated. Thanks!
raise TypeError("data argument can't be an iterator")
TypeError: data argument can't be an iterator
This is a common problem caused by zip
.
This is because zip
changed in python 3 and it returns an iterator now.
In the pca_summary
function do this:
def pca_summary(pca, standardised_data, out=True):
names = ["PC"+str(i) for i in range(1, len(pca.explained_variance_ratio_)+1)]
a = list(np.std(pca.transform(standardised_data), axis=0))
b = list(pca.explained_variance_ratio_)
c = [np.sum(pca.explained_variance_ratio_[:i]) for i in range(1, len(pca.explained_variance_ratio_)+1)]
columns = pd.MultiIndex.from_tuples([("sdev", "Standard deviation"), ("varprop", "Proportion of Variance"), ("cumprop", "Cumulative Proportion")])
summary = pd.DataFrame(list(zip(a, b, c)), index=names, columns=columns)
if out:
print("Importance of components:")
display(summary)
return summary
So just replace
summary = pd.DataFrame(zip(a, b, c), index=names, columns=columns)
with
summary = pd.DataFrame(list(zip(a, b, c)), index=names, columns=columns)