Search code examples
pythonregressioncurve-fittinggenetic-algorithmsupervised-learning

Deriving functions from data series


I'm trying to write a program in python that uses a genetic algorithm to extract a function that matches a set of data.

For example, if I input:

3, 5, 7, 9, 11

The output should be something like:

2x + 1

Problem is, I'm trying to do this in Python and I have no idea how to go about modelling mathematical functions in Python. Will I need something like a neural network?

I have written a generic genetic algorithm ready to modify once I figure out how to even begin.

Thanks a lot.


Solution

  • What you're looking for is called symbolic regression.

    Since you're already on the genetic algorithms path, you could take a further step trying genetic programming.

    With genetic programming each individual is a "program" represented using terminals and primitive functions (building blocks).

    E.g, for symbolic regression:

    terminal set = {x, 1, 2, 3, 4, 5, 6, 7, 8, 9}
    function set = {+, -, *, /, exp, log, sin}
    

    A "simple" program (2x + 1):

                       +
                      / \
                     2   +
                        / \
                       *   \
                      / \   \
                     10  x   \
                              -
                             / \
                            *   1
                           / \
                          -   x
                         / \
                        3  11
    

    This is represented as a tree structure, but non-tree representations have been successfully implemented.

    GP has the disadvantage that a large search space have to be explored and it's very computationally expensive. This can be attenuated by limiting the set of building blocks provided to the algorithm, based on existing knowledge of the system that produced the data.

    Nevertheless this characteristic of symbolic regression also has advantages: because the evolutionary algorithm requires diversity in order to effectively explore the search space.

    If yours is a learning project, python is a good language to develop a simple framework within a relatively short time.

    Otherwise you could take a look at DEAP (a good Python-based evolutionary computation framework including Genetic Programming).