Is there any way to combine Funcs into a Func has one more dimension?

I started to learn Halide from last month. And finally encounterd big problem for me.

I'm trying to implement function like following C-like code in Halide.

for( int y = 0; y < 3; ++y ){
    for( int x = 0; x < 3; ++x ){
        out(x, y) = out(x-1, y-1) + 1;
    }
}

so assuming initial image is below.

0 0 0
0 0 0
0 0 0

output image will be …(0 out of bound)

1 1 1
1 2 2
1 2 3

so I thought two possible solutions.

・Solution1

Define above algorithm like this recursive function.

Func algorithm1(Func input, int n)
{
    Func src, clamped, dst;
    Var x, y;

    if(n == 1){
            src = input;
    }else{
            src = algorithm1(input, n-1);
            src.compute_root();
    }

    clamped = BoundaryConditions::constant_exterior(src, 0, 0, SIZE, 0, SIZE);
    dst(x, y) = clamped(x-1, y-1) + 1;

    return dst;
}

And use above function like following code.

Func input, output;
input(x, y) = src(x, y);
output = algorithm1(input, SIZE);
output.realize(src);

This implementation barely works. But obviously rebundunt. Because most of the computation result of the each stage(Func) are not match the final result although each Func computes across over entire image. And I need to handle more large(normal) images. So I thought another possible solution.

・Solution2

At first of second solution.
Declare a function defines relationship between one column and another one.

Func algorithm2(Func src)
{
    Func clamped, dst;
    Var x;

    clamped = BoundaryConditions::constant_exterior(src, 0, 0, SIZE);
    dst(x) = clamped(x-1) + 1;

    return dst;
}

Then, let's combine this.

Func output[3];
output[0](x) = cast<uint32_t>(0);
for(int i = 1; i < SIZE; ++i){
    output[i] = algorithm2(output[i-1]);
}

Alright... Here's the problem. How can I combine this array of Funcs as a Func?

Of cource, I can get an Image if I realize this array of Funcs at each func to an pointer of the column's head. But What if I want to pass it to the next Func?

I looked around entire Halide examples(test, apps) these days. But I think there's no similar example. And you might already noticed my discomfort of English, actually I'm a japanese. So if there are useful example for this problem, I'm so sorry in advance. If so, please tell me where it is. If there's another good implementation idea, please teach me. Anyway I need someone's help!

I appreciate for your reading.

[edit 2]

edit 1 is my foolish question. I can schedule it compute_root(). I've decided to left them on here, really embarrassing though. I hope this will be helpful to another foolish man.

[edit 1]

I'm appreciate to your fast and detailed response from bottom of my heart!

I'm sorry for late response, I wanted to reply to you after succeeding to implement my algorithm. However, my Halide code still doesn't work what I wanna do and got some things to confirm.

First off, I would like to tell you I realized my misunderstanding of Halide thanks to you. At first of my algorithm's implementation step, I wrote definition using only pure 'Var's. So I got following error.

All of a functions recursive references to itself must contain the same pure variables in the same places as on the left-hand-side.

I thought this error occured because of scheduling flexibility. If such definition is allowed and schedule it to split, It means that scheduling changes algorithm. This comprehension is correct? From such comprehension, although I already read reduction part of Tutorials and blur example, I misunderstood that I cannot access neighbor pixels in all of Func definitions. I don't know why though.

And reduction domain couldn't be split because of same reason. I think I got it now.

Here's another question to your code. Thanks to your Halide implementation example, I've almost succeeded to implement what I wanna do with no consideration. However, this implementation is desperately slow although I'm handling 20x20 cropped image for ease of debugging.

I'm considering this slowness is caused by reduction domain. In your example, for example when calculating the value g(10, 10), Halide calculation is scheduled from f(0, 0) to f(0, 0) and finally get there value. In the other hand, C implementation just loads the value at g(9, 9) and just increment it though. We can confirm such calculation from printing loop nest.

produce g:
  for y:
    for x:
      produce f:
        for y:
          for x:
            f(...) = ...
        for range:
          for range:
            f(...) = ...
      consume f:
        g(...) = ...

I would like to confirm that Avoiding this recomputation is impossible? and so you suggested it?

And I would like to ask you another simple question. If there is reverse-dependency like this,

for( int y = 2; y > 0; --y ){
    for( int x = 2; x > 0; --x ){
        out(x, y) = out(x+1, y+1) + 1;
    }
}

Is Halide able to express this code?

Solution

The algorithm1 and algorithm2 parts here are not very clear to me. I understand the initial problem statement and the English seems fine so I will endeavor to provide some help answering the question I think you are asking. I'll do this by illustrating a few Halide mechanisms you may not know about or that aren't obvious for use here. Hopefully this will be helpful.

First off, to map a dimension of a Halide Func to different expressions, you pretty much have to use a select statement:

Var x, y, n;
Func f_0, f_1, f_both;
f_0(x, y) = ...;
f_1(x, y) = ...;
f_both(x, y, n) = select(n == 0, f_zero, f_one);

This can be expanded to more cases via adding arguments to the select. This is more useful for piecewise computations than for recursive structures but seems the most direct answer to the question in the title.

The second mechanism is Tuple. This allows a Func to have more than one value, which can be indexed with compile time constants. I don't think this is the answer you are looking for, but i tis convered in tutorial/lesson_13_tuples.cpp .

Finally, Halide supports reductions, which are designed to handle the case in the first code example. This looks like so:

Var x, y;
Func f, g;
RDom range(0, 3, 0, 3); // Form is min/extent, not start/end

f(x, y) = 0; // Initial condition
f(range.x, range.y) = f(range.x - 1, range.y - 1) + 1;

g(x, y) = f(x, y);

Buffer<int32t> result = g.realize(3, 3);

This should produce the output from your first example. Reductions, or "update definitions" are covered in tutorial/lesson_09_update_definitions.cpp .