Search code examples
pythonpandasscikit-learnpca

PCA analysis using python scikit-learn - type error


I'm trying to replicate the PCA example found here but when trying to run the pca_summary() I get the following error, any thoughts much appreciated. Thanks!

   raise TypeError("data argument can't be an iterator")
TypeError: data argument can't be an iterator

Solution

  • This is a common problem caused by zip.

    This is because zip changed in python 3 and it returns an iterator now.

    see also here

    In the pca_summary function do this:

    def pca_summary(pca, standardised_data, out=True):
        names = ["PC"+str(i) for i in range(1, len(pca.explained_variance_ratio_)+1)]
        a = list(np.std(pca.transform(standardised_data), axis=0))
        b = list(pca.explained_variance_ratio_)
        c = [np.sum(pca.explained_variance_ratio_[:i]) for i in range(1, len(pca.explained_variance_ratio_)+1)]
        columns = pd.MultiIndex.from_tuples([("sdev", "Standard deviation"), ("varprop", "Proportion of Variance"), ("cumprop", "Cumulative Proportion")])
        summary = pd.DataFrame(list(zip(a, b, c)), index=names, columns=columns)
        if out:
            print("Importance of components:")
            display(summary)
        return summary
    

    So just replace

    summary = pd.DataFrame(zip(a, b, c), index=names, columns=columns)
    

    with

    summary = pd.DataFrame(list(zip(a, b, c)), index=names, columns=columns)