"Interleaved rendering" in fragment shader

P.S. Yes, I posted this question on Computer Graphics Stack Exchange. But posting there also in hope more people will see

Intro

I'm trying to render multi-channel images (more than 4 channels, for the purposes of feeding it to a Neural Network). Since OpenGL doesn't support it natively, I have multiple 4-channel render buffers, into which I render a corresponding portion of channels.

For example, I need multi-channel image of size 512 x 512 x 16, in OpenGL I have 4 render buffers of size 512 x 512 x 4. Now the problem is that the Neural Network expects the data with strides 512 x 512 x 16, i.e. 16 values of channels of one pixel are followed by 16 values of channels from the next pixel. However currently I can efficiently read my 4 render buffers via 4 calls to glReadPixels, basically making the data having strides 4 x 512 x 512 x 4. Manual reordering of data on the client side will not suffice me as it's too slow.

Main question

I've got an idea to render to a single 4-channel render buffer of size 512*4 x 512 x 4, because stride-wise it's equivalent to 512 x 512 x 16, we just treat a combination of 4 pixels in a row as a single pixel of 16-channel output image. Let's call it an "interleaved rendering"

But this requires me to magically adjust my fragment shader, so that every group of consequent 4 fragments would have exactly the same interpolation of vertex attributes. Is there any way to do that?

This bad illustration with 1 render buffer of 1024 x 512 4-channel image, is an example of how it should be rendered. With that I can in 1 call glReadPixels extract the data with stride 512 x 512 x 8

EDIT: better pictures What I have now (4 render buffers)

What I want to do natively in OpenGL (this image is done in Python offline)

Solution

But this requires me to magically adjust my fragment shader, so that every group of consequent 4 fragments would have exactly the same interpolation of vertex attributes.

No, it would require a bit more than that. You have to fundamentally change how rasterization works.

Rendering at 4x the width is rendering at 4x the width. That means stretching the resulting primitives, relative to a square area. But that's not the effect you want. You need the rasterizer to rasterize at the original resolution, then replicate the rasterization products.

That's not possible.

From the comments:

It just got to me, that I can try to get a 512 x 512 x 2 image of texture coordinates from vertex+fragment shaders, then stitch it with itself to make 4 times wider (thus we'll get the same interpolation) and from that form the final image

This is a good idea. You'll need to render whatever interpolated values you need to the original size texture, similar to how deferred rendering works. So it may be more than just 2 values. You could just store the gl_FragCoord.xy values, and then use them to compute whatever you need, but it's probably easier to store the interpolated values directly.

I would suggest doing a texelFetch when reading the texture, as you can specify exact integer texel coordinates. The integer coordinates you need can be computed from gl_FragCoord as follows:

ivec2 texCoords = ivec2(int(gl_FragCoord.x * 0.25f), int(gl_FragCoord.y));