Search code examples
assemblycompiler-theorycomputation-theoryinformation-theory

Subroutine inference


Is there any paper describing any algorithm/technique to infer subroutines from a compiled program? In other words: is there an algorithm to find blocks of code that appear more than once in the program? These blocks could have the instructions reordered (without program behavior change, of course) so that it's more likely to find a match.

This process can be seen as the opposite of subroutine inlining that is done by compilers to avoid calls, but increasing the binary size.

It seems to me that this is a very hard theoretical problem.


Solution

  • Well, it's an interesting problem. People did actually work on this. A quick search returns these two:

    But there are probably many more. You could use Google Scholar to find more recent papers that reference these old ones.