Search code examples
pythondeep-learningdataset

If UMAP clustering result of the dataset is bad, is it un-classifiable?


I made a dataset for my own research, which has about 30,000 datas.

Each data has 20 floats as input and 4 classes.

The training with any network model were bad(always overfit) so I drew a UMAP and got the result like following :

enter image description here

Here class 0 (dark blue ) is distributed on everywhere, which represents "class-0 data should be ignored during process".

When ignoring 99% of class-0 data, the UMAP becomes :

enter image description here

As you can see, the result is nice.

Since class-0 datasets are very important so that I cannot totally remove them .

In this case, what should I do to get the optimal deep learning result ?

Please tell me any solutions with smallest possibility and I will deeply thank you.


Solution

  • As @Shasa mention in comment, I could make my umap result better.

    I just added reducer.fit(digits.data, y=label_list) on UMAP transform code.