I am attempting to do some data analysis with PCA sklearn package. The issue I'm currently running into is the way my code is analysing the data.
An example of some of the data is as follows
wavelength intensity ; [um] [W/m**2/um/sr] 196.078431372549 1.108370393265022E-003 192.307692307692 1.163428008597600E-003 188.679245283019 1.223639983609668E-003
The code written so far is as follows:
scaler = StandardScaler(with_mean=True, with_std=True) #scales the data
data_crescent=ascii.read('earth_crescent.dat',data_start=4958, data_end=13300, delimiter=' ')#where the data is being read
#where each variable comes from in the dat
y_intensity_crescent=data_crescent['col2'][:]
x_wave_crescent=data_crescent['col1'][:]
standard_y_crescent=StandardScaler().fit_transform(y_intensity_crescent)#standardizing the intensity variable
#PCA runthrough of data
pca= PCA(n_components=2)
principalCrescentY=pca.fit_transform(standard_y_crescent)
principalDfcrescent = pd.DataFrame(data = principalCrescentY
, columns = ['principal component 1', 'principal component 2'])
finalDfcrescent = pd.concat([principalDfcrescent, [y_intensity_crescent]], axis = 1)
Once ran, the data produces this error:
ValueError: Expected 2D array, got 1D array instead:
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample
In order to analyze the data via PCA, the data needs to be transformed into a 2D model, to produce the expected results. Any work around would be much appreciated!
The problem is that you are giving one feature y_intensity_crescent
to your pca object by doing: principalCrescentY=pca.fit_transform(standard_y_crescent)
. You are in fact giving only one dimension to your pca algorithm. Roughly: principal component analysis takes multiple features time series and will combine them into components which are combination of the features. If you want 2 components you need more than 1 features.
Here is some example of how to use it properly: PCA tutorial using sklearn