Force shader on mobile to output 32-bit integers

My use case is rather unusual - I need to generate input for a neural network, which is an image with 16 values per pixel of type int8. Previously I used raw OpenGL/ES to generate this input - luckily it supports R32G32B32A32 type for the textures, and I had no problems in the shaders to pack all the data into this texture (i.e. 16 int8 are packed as 4 channels of type int32).

Important - OpenGL worked correctly both on desktop and mobile

Now when I'm doing the same in Unity, it works correctly on desktop, but on the same mobile devices I cannot write past lower 16 bits of my 32 bits per channel. If I keep higher 16 bits as zeros, it's all ok. If I try to fill the higher 16 bits with the data I need - the whole channel turns into one of two weird clamped values like 0x0080FFFF or inverse of that 0xFF7F0000. I tried to use either signed or unsigned data types for texture and shader, but it still only worked correctly only on desktop.

Could you suggest any modification to the shader or configuration to make it work? Maybe switching to compute shaders would do something? Or can I somehow go to lower level OpenGL without compromising the framework of Unity (rendering MeshRenderer/SkinnedMeshRenderer components from Camera object, without hassle)?

In the following shader code, you don't need to understand how the 32 bit value is computed. It's basically about getting 4 bytes separately through my algorithm and packing into a single int/uint via bit-shift (implemented as multiplications, like x*1U + y*256U + z*65536U + w*16777216U)

Shader "Unlit/Neural Input"
{
    Properties
    {
        _NeuralTex ("NeuralTexture", 2DArray) = "" {}
    }
    SubShader
    {
        Tags { "RenderType"="Opaque" }
        //LOD 100
     
        Pass {
            //Tags { "LightMode" = "Never" }
         
            Fog { Mode Off }
            ZWrite On
            ZTest LEqual
            Cull Back
            Lighting Off
 
            CGPROGRAM
            #pragma vertex vert
            #pragma fragment frag
            #pragma    require    2darray integers
            #pragma fragmentoption ARB_precision_hint_nicest
            //#pragma target 5.0 // OPENGL ES of specific version
 
            // depends on target texture's internal type
            #pragma multi_compile SIGNED_INTEGERS UNSIGNED_INTEGERS
           
            #include "UnityCG.cginc"
 
            struct vertexInput
            {
                float4 vertex : POSITION;
                float2 uv : TEXCOORD0;
            };
 
            struct v2f
            {
                float4 vertex : SV_POSITION;
                float2 uv : TEXCOORD0;
            };
 
            v2f vert (vertexInput v)
            {
                v2f o;
                o.vertex = UnityObjectToClipPos(v.vertex);
                o.uv = v.uv;
                return o;
            }
 
            UNITY_DECLARE_TEX2DARRAY(_NeuralTex);
 
#if UNSIGNED_INTEGERS
            uint4 frag (v2f i) : SV_Target
#elif SIGNED_INTEGERS
            int4 frag (v2f i) : SV_Target
#endif
            {
                uint4 packingMultiplier = uint4(1U, 256U, 65536U, 16777216U);
                // 4 byte values packed into a single channel
                // but if 3rd or 4th byte is set to other than 0 it renders incorrectly
                uint4 latentFragment = uint4(
                    dot(uint4(10 , 20 , 0, 0), packingMultiplier), // ok
                    dot(uint4(50 , 60 , 0, 0), packingMultiplier), // ok
                    dot(uint4(90 , 100, 1, 0), packingMultiplier), // not ok
                    dot(uint4(130, 128, 0, 1), packingMultiplier) // not ok
                );
#if UNSIGNED_INTEGERS
                    return latentFragment;
#elif SIGNED_INTEGERS
                    return int4(latentFragment);
#endif
                }
            }
            ENDCG
        }
    }
}

Solution

After debugging with RenderDoc tool, I found out that there was no such issue. It correctly rendered into R32G32B32A32_UInt texture, with all 32-bit filled correctly.

But then, since it was the URP (Universal Rendering Pipeline), it has a built-in FinalBlitPass, that copied this data into another texture of the same type, yet the copying process ruins precision for my data.

In the source file of Unity called UnityEngine.Rendering.Universal.UniversalRenderer the URP Render has such code to call FinalBlitPass

// if post-processing then we already resolved to camera target while doing post.
// Also only do final blit if camera is not rendering to RT.
bool cameraTargetResolved =
    // final PP always blit to camera target
    applyFinalPostProcessing ||
    // no final PP but we have PP stack. In that case it blit unless there are render pass after PP
   (applyPostProcessing && !hasPassesAfterPostProcessing && !hasCaptureActions) ||
   // offscreen camera rendering to a texture, we don't need a blit pass to resolve to screen
   m_ActiveCameraColorAttachment.nameID == m_XRTargetHandleAlias.nameID;

// We need final blit to resolve to screen
if (!cameraTargetResolved)
{
    m_FinalBlitPass.Setup(cameraTargetDescriptor, sourceForFinalPass);
    EnqueuePass(m_FinalBlitPass);
}

So in order to disable the final blit, I had to make sure the camera didn't use postprocessing, such as anti-aliasing.

However after disabling them on my camera object, surprisingly it still put the FinalBlitPass. Then I noticed that my Renderer used a Renderer Feature called AR Background Renderer Feature used for XR Foundation. It seemed to add some post-processing or something, which caused FinalBlit to appear for my rendering. After separating the renderers into a default one (for the main world), and a custom one (for the neural network input generation only), the FinalBlitPass was removed, and I successfully can read the data in full precision!