I have a categorical training set like this
col1 col2 col3 col4
9 8 10 9
10 8 9 9
.....................
and after i reduced the dimensions by applying MCA(Multiple Correspondance Analysis) on it, i got something like this
dim1 dim2
0.857 -0.575
0.654 0.938
.............
Now my question is how to find the (dim1, dim2) of a new data like this as input ?
col1 col2 col3 col4
10 9 8 8
the outputs of MCA after performing on the training set is eigenvalues, inertia etc
My code in python:
from sklearn.cluster import KMeans
import prince
data = pd.read_csv("data/training set.csv")
X = data.loc[:, 'OS.1':'DSA.1']
size = len(X)
X = X.values.tolist()
#...
#data preprocessing
#...
df = pd.DataFrame(X)
mca = prince.MCA(
n_components=2,
n_iter=3,
copy=True,
check_input=True,
engine='auto',
random_state=42
)
mca = mca.fit(df)
X = mca.transform(df)
km = KMeans(n_clusters=3)
km.fit(X)
1.I want to take an input from user 2.Preprocess it before performing dimensional reduction using MCA 3.predict it's cluster using K means
You just need to keep your MCA object mca
alive to be able to use it to just transform new input data. To do that, just call the transform method on your new data
from sklearn.cluster import KMeans
import prince
data = pd.read_csv("data/training set.csv")
X = data.loc[:, 'OS.1':'DSA.1']
size = len(X)
X = X.values.tolist()
#...
#data preprocessing
#...
df = pd.DataFrame(X)
mca = prince.MCA(
n_components=2,
n_iter=3,
copy=True,
check_input=True,
engine='auto',
random_state=42
)
mca = mca.fit(df)
X = mca.transform(df)
km = KMeans(n_clusters=3)
km.fit(X)
# New data into x_new
# 1. Preprocess x_new as you preprocessed x
# Reuse mca on x_new
df_new = pd.DataFrame(x_new)
X_new = mca.transform(df_new)
# predictions
km.predict(X_new)