I have a dataset into a pandas dataframe with 9 set of features and 249 rows, I would like to get a covariance matrix amongst the 9 features (resulting in a 9 X 9 matrix), however, when I use the df.cov() function, I only get a 3 X 3 matrix. What am I doing wrong here?
Thanks!
Below is my code snippet
# perform data preprocessing
# only get players with MPG with less than 20 and only select the required colums
MPG_df = df.loc[df['MPG'] >= 20]
processed_df = MPG_df[["FT%", "2P%", "3P%", "PPG", "RPG", "APG", "SPG", "BPG", "TOPG"]]
processed_df
And when I attempt in getting the covariance matrix using the code below, I only get a 3 X 3 matrix
#desired result
cov_processed_df = df = pandas.DataFrame(processed_df, columns=['FT%', '2P%', '3P%', 'PPG', 'RPG', 'APG', 'SPG', 'BPG', 'TOPG']).cov()
cov_processed_df
Thanks!
The excluded columns are probably non-numeric (even though they look like so!). Try
cov_processed_df = processed_df.astype(float).cov()
To see the data types of the original df, you may run:
print(processed_df.dtypes)
If you see "object"
appearing in the result, then it means those columns are non-numeric. (Even if they contain at least 1 non-numeric data, they are flagged as non-numeric.)