I made a dataset for my own research, which has about 30,000 datas.
Each data has 20 floats as input and 4 classes.
The training with any network model were bad(always overfit) so I drew a UMAP and got the result like following :
Here class 0 (dark blue ) is distributed on everywhere, which represents "class-0 data should be ignored during process".
When ignoring 99% of class-0 data, the UMAP becomes :
As you can see, the result is nice.
Since class-0 datasets are very important so that I cannot totally remove them .
In this case, what should I do to get the optimal deep learning result ?
Please tell me any solutions with smallest possibility and I will deeply thank you.
As @Shasa mention in comment, I could make my umap result better.
I just added
reducer.fit(digits.data, y=label_list)
on UMAP transform code.