I am trying to reduce the dimension of the data using PCA, however, when I use concat, it automatically generating a NaN value. Also the customer age has become float while it was int. Can someone please tell me how can I solve this problem? Also it would be highly appreciated if you please tell me if I should use PCA or tSNE to visualize the data with 14 variables (in which there is a column which just contain 4 different variables (1,2,3,4) out of 12000 values, there are two columns with booleans).
# Separating out the Demographic Data.
x = Demo_Data.values
# Separating out the Target as regions.
y = df2.loc[:,['Customer_Age']].values
# Standardizing the features
scaler = StandardScaler()
x = scaler.fit_transform(x)
from sklearn.decomposition import PCA
pca = PCA(n_components=2)
Demography_Data = pca.fit_transform(x)
principalDf = pd.DataFrame(data = Demography_Data
, columns = ['Demography_Data 1', 'Demography_Data 2'])
finalDf = pd.concat([principalDf, df2[['Customer_Age']]], axis = 1)
The index
in your DataFrame
's do not match:
>>> import pandas as pd
>>> df1 = pd.DataFrame([11,22,33])
>>> df2 = pd.DataFrame([111,222,333], index=[1,2,3])
>>> pd.concat((df1,df2),axis=1)
0 0
0 11.0 NaN
1 22.0 111.0
2 33.0 222.0
3 NaN 333.0
however:
>>> df2.index=df1.index
>>> pd.concat((df1,df2),axis=1)
0 0
0 11 111
1 22 222
2 33 333