Search code examples
pythonpython-3.xpandaspcadimensionality-reduction

I am getting unexpected NaN value when trying pd.concat. How to deal with this? PCA vs T-SNE


I am trying to reduce the dimension of the data using PCA, however, when I use concat, it automatically generating a NaN value. Also the customer age has become float while it was int. Can someone please tell me how can I solve this problem? Also it would be highly appreciated if you please tell me if I should use PCA or tSNE to visualize the data with 14 variables (in which there is a column which just contain 4 different variables (1,2,3,4) out of 12000 values, there are two columns with booleans).

x and y

# Separating out the Demographic Data.

x = Demo_Data.values

# Separating out the Target as regions. 
y = df2.loc[:,['Customer_Age']].values

# Standardizing the features
scaler = StandardScaler()
x = scaler.fit_transform(x)

from sklearn.decomposition import PCA
pca = PCA(n_components=2)
Demography_Data = pca.fit_transform(x)
principalDf = pd.DataFrame(data = Demography_Data
             , columns = ['Demography_Data 1', 'Demography_Data 2'])
finalDf = pd.concat([principalDf, df2[['Customer_Age']]], axis = 1)

Solution

  • The index in your DataFrame's do not match:

    >>> import pandas as pd
    >>> df1 = pd.DataFrame([11,22,33])
    >>> df2 = pd.DataFrame([111,222,333], index=[1,2,3])
    >>> pd.concat((df1,df2),axis=1)
          0      0
    0  11.0    NaN
    1  22.0  111.0
    2  33.0  222.0
    3   NaN  333.0
    

    however:

    >>> df2.index=df1.index
    >>> pd.concat((df1,df2),axis=1)
        0    0
    0  11  111
    1  22  222
    2  33  333