Search code examples
machine-learningdatasetlogistic-regression

Separating a dataset in the form of two half semi circle


There are two semi-circles of width thk with inner radius rad, separated by sep as shown (red is -1 and blue is + 1). The center of the top semi-circle is aligned with the middle of the edge of the bottom semi-circle. This task is linearly separable when sep >= 0, and not so for sep <0. Set rad = 10, thk = 5 and sep = 5 . Then, generate 2,000 examples uniformly, which means you will have approximately 1,000 examples for each class.

Figure Describing the problem:

enter image description here

I used logistic regression to separate the datasets in semicircles. When sep is negative the result of logistic regression is not the best.

I want to add a new nonlinear feature to dataset, after which the logistic regression will produce better result.


Solution

  • The problem you have here is that Logistic regression is a classifier with linear boundaries. To solve this you can use something called the kernel trick which essentially lets you transform your data into a space that is linearly separable.

    If you choose a kernel such as the Radial Basis function (RBF) you should be able to the datasets 100% accuracy.

    A more simple way would be to add a feature such as x^2y^2 as a 3rd column to your dataset which may give better results but is usually used when your data is an ellipsoid.