I.e., I get a list of words and I want to construct a simple regular expression from that which matches at least all of the words (but maybe more).
I want to have an algorithm for that. I.e. input of that algorithm is a list of words and output is a regular expression. Obviously, there will be some restrictions. Like either the regular expression will always match more words if it should match an infinite amounts of words and I only give it a finite number of words. Or I will need some more compact representation of the input. Or I am also thinking about giving me some regular expression as input and a list of additional words and I want to get a regular expression which matches all of them together (and maybe more). In any case, it should try to construct a regular expression which is as simple as possible.
What techniques are availalbe which can do that?
I was quite misunderstood. I know the general principles behind regular expressions. I know what it is. And in most cases I can come up quite easily with a regular expression to some language by hand. But I am searching for algorithms which does that.
Again formulated a bit different:
Let L be a regular language. Let M_n be a finite subset of L with n elements. Let M_n be a subset of M_(n+1).
I want to have an algorithm LRE which gets a finite set of words and outputs a regular expression. And I want to have the property:
lim_n->infinity | diff( LRE(M_n), L ) | = 0
This problem has been looked at the last decade. You might want to google DFA learning, and download a couple of papers to get a sense of the state of the art.
Once you have the DFA generating a regular expression is trivial. To avoid the problems @FrustratedWithDesign mentions some conditions such as generating the DFA with the least amount of nodes is introduced, from a machine learning point of view this is similar to having a regularization condition for the simplest hypothesis.