Search code examples
regexartificial-intelligencetheoryautomatagrammar-induction

Is it possible for a computer to "learn" a regular expression by user-provided examples?


Is it possible for a computer to "learn" a regular expression by user-provided examples?

To clarify:

  • I do not want to learn regular expressions.
  • I want to create a program which "learns" a regular expression from examples which are interactively provided by a user, perhaps by selecting parts from a text or selecting begin or end markers.

Is it possible? Are there algorithms, keywords, etc. which I can Google for?

EDIT: Thank you for the answers, but I'm not interested in tools which provide this feature. I'm looking for theoretical information, like papers, tutorials, source code, names of algorithms, so I can create something for myself.


Solution

  • The book An Introduction to Computational Learning Theory contains an algorithm for learning a finite automaton. As every regular language is equivalent to a finite automaton, it is possible to learn some regular expressions by a program. Kearns and Valiant show some cases where it is not possible to learn a finite automaton. A related problem is learning hidden Markov Models, which are probabilistic automata that can describe a character sequence. Note that most modern "regular expressions" used in programming languages are actually stronger than regular languages, and therefore sometimes harder to learn.