Search code examples
arithmetic-expressionsgenetic-programming

Finding a relationship without knowing the operators?


1 + 2 + 3 = 6 is a simple equation, but let's say you have something like this:

1 ? 2 ? 3 = 6

How do you find the operators? Is it possible?

I've experimented a bit with gplearn in Python but it seems like you need to know at least some of the operators beforehand to use it properly. Also, is this what symbolic regression is used for?


Solution

  • Genetic programming needs a problem specific function set.

    In gplearn the available function set is controlled by an argument (function_set) that is set when initializing an estimator:

    gp = SymbolicRegressor(function_set=['add', 'sub', 'mul', 'sin', 'abs', 'sqrt'])
    

    For a simple numeric problem, the function set may consist of merely the arithmetic functions (and that's the default set for gplearn).

    In general sufficiency is one of the properties the function set should have. Sufficiency means it is possible to express a solution to the problem using the element of the primitive set (function set + terminal set).

    Unfortunately it can be guaranteed only for certain domains (e.g. ['and', 'or', 'not'] for a Boolean induction problem).

    However in many cases GP can develop expressions that approximate very closely the desired one using an insufficient primitive set.

    Often adding a few unnecessary functions in an attempt to ensure sufficiency doesn't tend to slow down GP overmuch. Anyway you have to try since it can bias the system in unexpected ways.

    There are also techniques to provide a mechanism by which the evolutionary process can evolve potentially reusable components (e.g. ADF - Automatic Defined Functions). ADFs are composed using the primitive set.


    is this what symbolic regression is used for

    Usually also variables and constants (terminal set) are subject to evolution/recombination. The specific problem is a sort of symbolic regression with additional constraints.