hi i m new in machine learning.i want to train a KNN-classifier with the dataset having fifty complete records(without missing values) and 103 incomplete records (including missing values)
i want to ask that is this dataset is defensible for the classification purpose. or should i search for some new dataset?
i m attaching some screenshots from my dataset.. POS is the label class in the dataset.
If your feature space is of size n
, i.e. no of input columns, than a k*n
, where k >= 3, complete sample size should be a good amount of data to start with.
You can also look into imputing your data (missing values) with mean or any other extrapolation methods.
One rough heuristic that is sometimes advocated is that the number of data points should be no less than some multiple (say 5 or 10) of the number of adaptive parameters in the model. - Bishop, Page no. 9