I am writing an android renderscript
program. It has a strange performance problem.
The program has two parts:
I tested part 1, and it takes 20ms; part 2 takes 10ms more. However, when I return the result from part 2, the overall execution time becomes 200ms.
Here is the code:
static uchar4 effect (uint32_t x, uint32_t y) {
/* Part 1: Mix two images */
const uchar4 *element1 = rsGetElementAt(gIn1, x, y);
const uchar4 *element2 = rsGetElementAt(gIn2, x, y);
float4 inColor1 = rsUnpackColor8888(*element1);
float4 inColor2 = rsUnpackColor8888(*element2);
float4 mixed = inColor1 * 0.5 + inColor2 * 0.5;
/* Part 2: Special Effect */
/* a lot computation here ... */
/* a lot computation here ... */
/* a lot computation here ... */
float4 effect = ...; /* computation result */
float4 mixedEffect = mixed * 0.5 + effect * 0.5;
/* Output result */
return rsPackColorTo8888(mixed); // fast
// return rsPackColorTo8888(mixedEffect); // very slow
}
void root(const uchar4 *v_in, uchar4 *v_out, const void *usrData,
uint32_t x, uint32_t y) {
*v_out = effect(x, y);
}
I made three tests: 1) Only Part 1 mixed image code, and return mixed float4, 20ms 2) Both Part 1 and Part 2 code, and return mixed float4, 30ms 3) Both Part 1 and Part 2 code, and return mixedEffect float4, 200ms
In 2nd and 3rd tests, the changing of returning variable cause the overall performance to become worse. Does anyone have an idea on why this is happening?
I think two things are happening. First is when you don't return the value derived from mixedEffect the compiler can eliminate the dead code since the result is never used. Assume any code that computes an unused value will not actually run.
The second is you really want to write your float constants as 0.5f and not 0.5. 0.5 specifies a double which is probably not what you want.