I have a function F, [bool] = F(DATASET, tresh1, tresh2), that take input a DATASET and some parameters, for exasemple 2 treshold value -tresh1 e tresh2-, and returns a boolean: 1 if DATASET is "good", 0 otherwise. The answer depends on the values tresh1 e tresh2 of course.
Suppose I have 100 DATASETs avaiable and I know which ones are good and which are not. I would like to "train" my function F, i.e. teach it a couple of value tresh1_ and tresh2_ such that F(DATASET, tresh1_, tresh2_) returns "true" for all (or most of) DATASETs "good" and "false" otherwize.
I expect that F(DATASET_, tresh1_, tresh2_), where DATASET_ is a new one (different from the previous 100), return me true if DATASET_ is really "good".
I could see that problem as a clustering problem: for every DATASET in the training set I choose random tresh1 and tresh2 and I mark which values makes sure that F returns correct value and which not. Hence I select a region where tresh1 and tresh2 values are "good". Is that a good method? Are there better ones?
In general, it seems to me a "parameters calibration problem". Does exist some classic tecniques to solve it?
What you want to do is commonly known as
See the Wikipedia article for details. The common approach is to perform a grid search, unless you can compute the derivatives of your function F.
This is a search method; it is commonly used in machine learning to optimize the performance, but it is not a "machine learning" algorithm itself.