Search code examples
opengl-esgpuimagefragment-shader

are there time limits on fragment shaders? iOS using filters with GPUImage from Brad Larson


I am using Brad Larson's excellent library GPUImage on iOS 8 and I've run into a problem filter that only finishes 3/4's of it's task/fragments, run on a single image, but multiple filters:

From the tests I am doing, it appears I am either violating a time limit, or a buffer size, but it strangely looks more like a time limit... even though that is probably not the case, more likely I am overflowing somewhere.

see this image below, I am running through some normal filters in GPUImage, and then applying a new filter that i've created at the end, which takes a good 5 to 8 seconds to complete. I can adjust the amount of loops inside the fragment shader of the new filter and see it run faster and then the last filter finishes, even though it is taking the same amount of buffer space (I believe).

(also see code of the fragment shader below)

if I leave the filter the way I want it, this image below is the result where it stops about 3/4s finished, and you can see ,strangely, the third to last filter underneath (a GPUImageDirectionalSobelEdgeDetectionFilter filter, rather than the second to last filter, a GPUImageDirectionalNonMaximumSuppressionFilter filter)

I couldn't find any "limits" in Brad Larson's code that was limiting on buffer or time.

Does it look like I am overflowing a buffer or running into some other limit? remember I can get this filter to finish by simply cutting down on some loops in the last fragment shader, and not changing anything else.. and the loops are not filling any buffers, only calculating some floating point numbers and vecs (possibly overflowing one somehow?)

(EDIT: it is possible that maybe some buffer/image space is being deallocated or other, because the process is taking so long it has time to deallocate/free?)

enter image description here

below is some of Brad's debug code on time amounts of the linking and compiling of the programs/filters

Core Graphics drawing time: 731.258035

GLProgram Compiled in 5.171001 ms

GLProgram Compiled in 2.515018 ms

GLProgram Linked in 5.878985 ms

GLProgram Compiled in 0.092983 ms

GLProgram Compiled in 0.181973 ms

GLProgram Linked in 1.731992 ms

GLProgram Compiled in 0.275016 ms

GLProgram Compiled in 0.414014 ms

GLProgram Linked in 1.176000 ms

GLProgram Compiled in 0.074029 ms

GLProgram Compiled in 0.380039 ms

GLProgram Linked in 0.957966 ms

GLProgram Compiled in 0.078022 ms

GLProgram Compiled in 1.359999 ms

GLProgram Linked in 5.873978 ms

here is a partial amount of the fragment shader, the loop part which I can adjust in any number of ways to make it take a shorter amount of time, and it finishes the filter, what i leave out (represented by etc...etc...) is more of the same inside the loops of this fragment shader of the new filter:

    [sourcePicture addTarget:theNewFilter];
    [theNewFilter useNextFrameForImageCapture];

    [sourcePicture processImage];

    UIImage *currentFilteredVideoFrame = [theNewFilter imageFromCurrentFramebuffer];

    [self.zoomView setImage:currentFilteredVideoFrame];

and the fragment shader:

(
    precision mediump float;

    uniform sampler2D inputImageTexture;
    varying mediump vec2 textureCoordinate;

    uniform mediump float texelWidth;
    uniform mediump float texelHeight;

    uniform mediump float texelWidthX2;
    uniform mediump float texelHeightX2;

    const int numOfConvolutions = 7;

    uniform int sAMPLES[numOfConvolutions];


    const int sAMPLES0 = 17;
    const int sAMPLES1 = 32;
    const int sAMPLES2 = 30;
    const int sAMPLES3 = 32;
    const int sAMPLES4 = 32;
    const int sAMPLES5 = 32;
    const int sAMPLES6 = 32;


    uniform mediump float convolutionCriteria[numOfConvolutions];

    uniform mediump vec3 pos0Weight[sAMPLES0];
    uniform mediump vec3 pos1Weight[sAMPLES1];
    uniform mediump vec3 pos2Weight[sAMPLES2];
    uniform mediump vec3 pos3Weight[sAMPLES3];
    uniform mediump vec3 pos4Weight[sAMPLES4];
    uniform mediump vec3 pos5Weight[sAMPLES5];
    uniform mediump vec3 pos6Weight[sAMPLES6];


 void main()
 {
    mediump vec4 textureColor = texture2D(inputImageTexture, textureCoordinate);

    mediump vec3 weightStep;

    mediump vec2 currentStep1;
    mediump vec2 currentStep2;

    mediump vec2 sideStepRight;
    mediump vec2 sideStepLeft;
    mediump vec2 bottomStep;
    mediump vec2 topStep;

    mediump float currentColorf;
    mediump float finalColorf1 = 0.0;
    mediump float finalColorf2 = 0.0;

    mediump float totalColor1f = 0.0;
    mediump float totalColor2f = 0.0;


    mediump float rightSideColorBotf;
    mediump float rightSideColorTopf;
    mediump float leftSideColorBotf;
    mediump float leftSideColorTopf;

    mediump float bottomRightSideColorf;
    mediump float topRightSideColorf;
    mediump float bottomLeftSideColorf;
    mediump float topLeftSideColorf;

    mediump vec2 currentCoordinate;


    if (textureColor.r > 0.02)
    {
        for (int j = 0; j < (numOfConvolutions - 1); j++)
        {
            totalColor2f = 0.0;
            totalColor1f = 0.0;
            for (int i = 2; i < sAMPLES[j]; i++)
            {
                     if (j == 0) weightStep = pos0Weight[i];
                else if (j == 1) weightStep = pos1Weight[i];
                else if (j == 2) weightStep = pos2Weight[i];
                else if (j == 3) weightStep = pos3Weight[i];
                else if (j == 4) weightStep = pos4Weight[i];
                else if (j == 5) weightStep = pos5Weight[i];



                sideStepLeft  = vec2(weightStep.x - texelWidthX2,  weightStep.y);
                currentStep1  = vec2(weightStep.x,                 weightStep.y);
                sideStepRight = vec2(weightStep.x + texelWidthX2,  weightStep.y);

                topStep      = vec2(weightStep.y,  -weightStep.x - texelHeightX2);
                currentStep2 = vec2(weightStep.y,  -weightStep.x);
                bottomStep   = vec2(weightStep.y,  -weightStep.x + texelHeightX2);




                //------------ Bottom first arm Side step right ---------------
                currentCoordinate = textureCoordinate.xy + sideStepRight;
                rightSideColorBotf = texture2D(inputImageTexture, currentCoordinate).r * weightStep.z;

                //------------ top half first arm Side step right ---------------
                currentCoordinate = textureCoordinate.xy - sideStepRight;
                rightSideColorTopf = texture2D(inputImageTexture, currentCoordinate).r * weightStep.z;



                //------------ Bottom first arm Side step left ----------

etc.... etc.... etc.....
                //------------ left half second arm ---------------
                currentCoordinate = textureCoordinate.xy - currentStep2;
                currentColorf = texture2D(inputImageTexture, currentCoordinate).r * weightStep.z;

                totalColor2f += currentColorf - (bottomLeftSideColorf + topLeftSideColorf);
            }

                 if (totalColor2f > convolutionCriteria[j]) {finalColorf2 = totalColor2f; break;}
            else if (totalColor1f > convolutionCriteria[j]) {finalColorf1 = totalColor1f; break;}
        }

    if ((finalColorf2 < 0.01) && (finalColorf1 < 0.01))
    {
        for (int j = 1; j < (numOfConvolutions - 1); j++)
        {
            totalColor2f = 0.0;
            totalColor1f = 0.0;

            for (int i = 2; i < sAMPLES[j]; i++)
            {
                     if (j == 1) weightStep = pos1Weight[i];
                else if (j == 2) weightStep = pos2Weight[i];
                else if (j == 3) weightStep = pos3Weight[i];
                else if (j == 4) weightStep = pos4Weight[i];
                else if (j == 5) weightStep = pos5Weight[i];


                sideStepLeft  = vec2(-weightStep.x - texelWidthX2,  weightStep.y);
                currentStep1  = vec2(-weightStep.x,                 weightStep.y);
                sideStepRight = vec2(-weightStep.x + texelWidthX2,  weightStep.y);

                topStep      = vec2(weightStep.y,   weightStep.x - texelHeightX2);
                currentStep2 = vec2(weightStep.y,   weightStep.x);
                bottomStep   = vec2(weightStep.y,   weightStep.x + texelHeightX2);



                //------------ Bottom first arm Side step right ---------------
                currentCoordinate = textureCoordinate.xy + sideStepRight;
                rightSideColorBotf = texture2D(inputImageTexture, currentCoordinate).r * weightStep.z;

                //------------ top half first arm Side step right ---------------
                currentCoordinate = textureCoordinate.xy - sideStepRight;
                rightSideColorTopf = texture2D(inputImageTexture, currentCoordinate).r * weightStep.z;



                //------------ Bottom first arm Side step left ---------------
etc.......etc......etc.....
                //------------ left half second arm ---------------
                currentCoordinate = textureCoordinate.xy - currentStep2;
                currentColorf = texture2D(inputImageTexture, currentCoordinate).r * weightStep.z;

                totalColor2f += currentColorf - (bottomLeftSideColorf + topLeftSideColorf);
            }

                 if (totalColor2f > convolutionCriteria[j]) {finalColorf2 = totalColor2f; break;}
            else if (totalColor1f > convolutionCriteria[j]) {finalColorf1 = totalColor1f; break;}
        }
    }
    }

if (finalColorf2 > 0.01)
{
    gl_FragColor = vec4(textureColor.r * 1.6,0.0,0.0,1.0);
} else if (finalColorf1 > 0.01) {
    gl_FragColor = vec4(0.0,0.0,textureColor.r * 1.6,1.0);
} else {
    gl_FragColor = textureColor;
}

} );


Solution

  • ok I finally determined that it was a hardware limit of some sort, like this: GL_MAX_FRAGMENT_UNIFORM_COMPONENTS, but not this one, there are so many and variations/combinations that I could not be certain which, however the fragment shader never failed to compile on the device, which I mistakenly assumed would have happened if going over some limit like this.

    the way I determined this was the problem, is that I ran the same exact code on an iPad mini2 with retina display and on an iPad mini 1, the iPad mini2 finished the fragment shader with no problem, and even did a photo twice as large as the above picture with no problem, no change in code, I am going to have to limit the hardware available for the app to run on it looks like to me.