I have this C# code sample (but language is absolutely not important) :
public static void NestedSimple(int[] a, int n)
{
for(int i = 0; i < n && i < 12; i++)
{
a[i] += 1;
}
}
Once compiled, I get this IL :
IL_0000: nop
IL_0001: ldc.i4.0
IL_0002: stloc.0
IL_0003: br.s IL_0017
// loop start (head: IL_0017)
IL_0005: nop
IL_0006: ldarg.0
IL_0007: ldloc.0
IL_0008: ldelema [mscorlib]System.Int32
IL_000d: dup
IL_000e: ldind.i4
IL_000f: ldc.i4.1
IL_0010: add
IL_0011: stind.i4
IL_0012: nop
IL_0013: ldloc.0
IL_0014: ldc.i4.1
IL_0015: add
IL_0016: stloc.0
IL_0017: ldloc.0
IL_0018: ldarg.1
IL_0019: bge.s IL_0022
IL_001b: ldloc.0
IL_001c: ldc.i4.s 12
IL_001e: clt
IL_0020: br.s IL_0023
IL_0022: ldc.i4.0
IL_0023: stloc.1
IL_0024: ldloc.1
IL_0025: brtrue.s IL_0005
// end loop
IL_0027: ret
which can be represented by this Control Flow Graph (reconstructed from IL, basic blocks are properly identified and domination relations computed by Lengauer-Tarjan) :
The back edge is IL_0005 -> IL_0017 as IL_0005 dominates IL_0017 which means loop header is IL_0017.
How can I identify that every block between IL_0017 and IL_0023 belong to the loop exit condition and not the plain body ?
My intuition is that IL_0005 is the actual body and the only edge going to IL_0005 starts from IL_0023, meaning that all blocks between IL_0017 and IL_0023 are part of the condition.
However -> it doesn't rely on theory and depends on C# -> MSIL compiler behavior.
Is there something solid that I can implement ?
So far, my code looks like :
// if back edge (n->d where d dominates n) then we have a loop
if ((block.JumpTarget != null && block.JumpTarget.Dominates(block)) ||
(block.Next != null && block.Next.Dominates(block)))
{
var header = block.JumpTarget != null ? block.JumpTarget : block.Next;
var tail = block;
var visited = new HashSet<BasicBlock> { header, tail};
var stack = new Stack<BasicBlock>();
// populate body
stack.Push(block);
while (stack.Any())
{
var n = stack.Pop();
if(!visited.Contains(n))
{
visited.Add(n);
foreach(var p in n.Predecessors)
{
stack.Push(p);
}
}
}
var body = visited.OrderBy(b => b.Id).ToList();
/// HERE : how can I identify loop condition ??
I don't if this answer your question, but it's possible to unroll the loop like the following (practically "extracting" the boolean condition)
IL_000
|
IL_0017
| \
IL_001b IL_0022
| /
- IL_0023 ----------.
' | |
| IL_0005 |
| | |
| IL_0017 |
| | \ |
| IL_001b IL_0022 |
' / / |
\--------/ |
|
IL_0027
you see that (i < n && i <12)
is encoded inside the 17
, 1b
, 22
and the if
by the 23
block, becoming the head of the loop.
My reasoning is that a while
loop is encoded by the following graph
I don't know an algorithm with only basic blocks, decompilers usually "simplify" them: in your case all the blocks I have extracted could be collapsed into a single statemement and merged into IL_0023
.