How to improve self define function PCA code?

I work on my little project that need to do group by PCA. Everything is fine however I look for a way to improve self defined PCA function.

Self defined function I use:

def pca(data):
    try:
        x = stats.zscore(data, nan_policy='omit')
        covar = np.cov(x, rowvar=False)
        eigval, eigvec = np.linalg.eig(covar)
    except Exception as e:
        return pd.Series([np.NaN]*len(data))
    else:
        return x@eigvec[:, :1]

I use this function to calculate 1st vector PCA as follow:

sam.groupby('gvkey')[['xgat', 'xgsale', 'xcap']].apply(pca)

Everything works fine. However, the only little issue is that there are three columns output. 1st is the gvkey, the 2nd is empty, and the 3rd is 0.

What I want: improve my self defined function so that the output has no 2nd index column. In general, the result should be similar to using groupby['col'].transform('mean')

I do not want work around solution like using reset_index() as: sam.groupby('gvkey')[['xgat', 'xgsale', 'xcap']].apply(pca).reset_index(level=1, drop=True).

Solution

You can use group_keys=False to remove the group key:

from scipy.linalg import LinAlgError

def pca(data):
    try:
        x = stats.zscore(data, nan_policy='omit')
        covar = np.cov(x, rowvar=False)
        eigval, eigvec = np.linalg.eig(covar)
    except LinAlgError:
        pass
    else:
        return x@eigvec[:, :1]

sam.groupby('gvkey', group_keys=False)[['xgat', 'xgsale', 'xcap']].apply(pca)