Search code examples
algorithmsignatureformal-languages

Generate shorthand function signature


Let's assume a function foo() with the following four overloads:

foo(a, b)
foo(a, b, d)
foo(a, c)
foo(a, c, d)

I want to generate a concise string that represents all overloads at once. In this case, the result should be foo(a, (b|c), [d]).

Edit: There is usually more than one concise representation. My goal is to get a representation that is as short as possible, counting only parameters. So foo(a, (b|c), [d]) has length 4 and is thus better than foo(a, ((b, [d])|(c, [d]))), which has length 5.

Is there an existing algorithm to solve this (or a similar) problem?

If not, can anyone sketch an approach?

I'm not picky about the programming language (I'm using C#, though).

The rules are:

  • Parameters with the same name represent the same thing for all overloads. a is a, b is b...
  • When collecting all distinct parameters over all overloads (in this case, a, b, c, d), every overload will adhere to this parameter order.
  • [...] means that the enclosed sub-expression can be omitted as a whole.
  • (...|...|...) means a choice of one of the sub-expressions. For readability's sake, such a sub-expression must not be empty.

To illustrate further: The (rather contrived) function bar()

bar(a, b,          f, g, h, i)
bar(a, b,          f, g, h)
bar(a, b,          f, g)

bar(a,    c,          g, h, i)
bar(a,    c,          g, h)
bar(a,    c,          g)

bar(a,       d,    f, g, h, i)
bar(a,       d,    f, g, h)
bar(a,       d,    f, g)

bar(a,          e, f, g, h, i)
bar(a,          e, f, g, h)
bar(a,          e, f, g)

should be represented as bar(a, (((b|d|e), f)|c), g, [h, [i]]).


Solution

  • First, let's assign some nomenclature.

    • [...] is an Option.
    • ..., ... is a Sequence.
    • ... | ... is a Choice.

    It seems the problem is tricky for two reasons. First, the syntax just isn't the same as in a Boolean expressions. For instance, although a Choice is similar to an OR, it means "take any one", not "take at least one". So an algorithm that generates an optimal Boolean expression might result in a suboptimal result once "translated" to our syntax.

    Second, the optimal solution might be something like a Sequence within a Choice within a Sequence within an Option. So any algorithm that can only create one kind of structure (like a Choice of Sequences) cannot possibly always return an optimal solution.

    The following describes the solution I've found. There is also a working implementation.

    First, we need to create a list of all distinct parameters over all overloads. As per the question, each overload will stick to this parameter order. So each overload can be represented as a Boolean array where each entry indicates whether the corresponding parameter is present. Now, the parameter list along with the list of overloads is given to a recursive function that works like this:

    1. Remove duplicate overloads, so that each overload is distinct.
    2. If there is only one overload, return a Sequence of the used parameters.
    3. If one of the overloads is empty: call the function recursively with all other overloads and return the result within an Option.
    4. Split the parameter list into constant areas (that are the same for all overloads) and independent areas. Each independent area should be kept as short as possible. An area is independent if you can take any overload and replace the flags within that area with those from any other overload. If this splitting results in at least one constant area, return a Sequence containing the constant parts and recurse for the independent areas.
    5. If all this fails, it may be because the overloads are too dissimilar. So create multiple groups of similar overloads. To do this by brute force, generate all partitions (see How to find all partitions of a set). For each group of overloads within each partition, call the function recursively and combine the results with a Choice. Then return the shortest of these Choices.

    I believe that for the reasons stated above, an algorithm that finds optimal solutions cannot be much simpler. I cannot prove that this algorithm does in fact always find the optimal solution, but for the signatures I've tried so far, it did.