I'm working on a streaming prototype using UE4. My goal here (in this post) is solely about capturing frames and saving one as a bitmap, just to visually ensure frames are correctly captured.
I'm currently capturing frames converting the backbuffer to a ID3D11Texture2D then mapping it.
Note : I tried the ReadSurfaceData approach in the render thread, but it didn't perform well at all regarding performances (FPS went down to 15 and I'd like to capture at 60 FPS), whereas the DirectX texture mapping from the backbuffer currently takes 1 to 3 milliseconds.
When debugging, I can see the D3D11_TEXTURE2D_DESC's format is DXGI_FORMAT_R10G10B10A2_UNORM, so red/green/blues are stored on 10 bits each, and alpha on 2 bits.
My questions :
What I've tried :
All the following code is executed in a callback function registered to OnBackBufferReadyToPresent (code below).
void* NativeResource = BackBuffer->GetNativeResource();
if (NativeResource == nullptr)
{
UE_LOG(LogTemp, Error, TEXT("Couldn't retrieve native resource"));
return;
}
ID3D11Texture2D* BackBufferTexture = static_cast<ID3D11Texture2D*>(NativeResource);
D3D11_TEXTURE2D_DESC BackBufferTextureDesc;
BackBufferTexture->GetDesc(&BackBufferTextureDesc);
// Get the device context
ID3D11Device* d3dDevice;
BackBufferTexture->GetDevice(&d3dDevice);
ID3D11DeviceContext* d3dContext;
d3dDevice->GetImmediateContext(&d3dContext);
// Staging resource
ID3D11Texture2D* StagingTexture;
D3D11_TEXTURE2D_DESC StagingTextureDesc = BackBufferTextureDesc;
StagingTextureDesc.Usage = D3D11_USAGE_STAGING;
StagingTextureDesc.BindFlags = 0;
StagingTextureDesc.CPUAccessFlags = D3D11_CPU_ACCESS_READ;
StagingTextureDesc.MiscFlags = 0;
HRESULT hr = d3dDevice->CreateTexture2D(&StagingTextureDesc, nullptr, &StagingTexture);
if (FAILED(hr))
{
UE_LOG(LogTemp, Error, TEXT("CreateTexture failed"));
}
// Copy the texture to the staging resource
d3dContext->CopyResource(StagingTexture, BackBufferTexture);
// Map the staging resource
D3D11_MAPPED_SUBRESOURCE mapInfo;
hr = d3dContext->Map(
StagingTexture,
0,
D3D11_MAP_READ,
0,
&mapInfo);
if (FAILED(hr))
{
UE_LOG(LogTemp, Error, TEXT("Map failed"));
}
// See https://dev.to/muiz6/c-how-to-write-a-bitmap-image-from-scratch-1k6m for the struct definitions & the initialization of bmpHeader and bmpInfoHeader
// I didn't copy that code here to avoid overloading this post, as it's identical to the article's code
// Just making clear the reassigned values below
bmpHeader.sizeOfBitmapFile = 54 + StagingTextureDesc.Width * StagingTextureDesc.Height * 4;
bmpInfoHeader.width = StagingTextureDesc.Width;
bmpInfoHeader.height = StagingTextureDesc.Height;
std::ofstream fout("output.bmp", std::ios::binary);
fout.write((char*)&bmpHeader, 14);
fout.write((char*)&bmpInfoHeader, 40);
// TODO : convert to R8G8B8 (see below for my attempt at this)
fout.close();
StagingTexture->Release();
d3dContext->Unmap(StagingTexture, 0);
d3dContext->Release();
d3dDevice->Release();
BackBufferTexture->Release();
(As mentioned in the code comments, I followed this article about the BMP headers for saving the bitmap to a file)
Texture data
One thing I'm concerned about is the retrieved data with this method. I used a temporary array to check with the debugger what's inside.
// Just noted which width and height had the texture and hardcoded it here to allocate the right size
uint32_t data[1936 * 1056];
// Multiply by 4 as there are 4 bytes (32 bits) per pixel
memcpy(data, mapInfo.pData, StagingTextureDesc.Width * StagingTextureDesc.Height * 4);
Turns out the 1935 first uint32 in this array all contain the same value ; 3595933029
. And after that, the same values are often seen hundred times in a row.
This makes me think the frame isn't captured as it should, because the UE4 editor's window doesn't have the exact same color on its first row all along (whether it's top or bottom).
R10G10B10A2 to R8G8B8(A8)
So I tried to guess how to convert from R10G10B10A2 to R8G8B8. I started from this value that appears 1935 times in a row at the beginning of the data buffer : 3595933029
.
When I color pick an editor's window screenshot (using the Windows tool, which gets me an image with the exact same dimensions as the DirectX texture, that is 1936x1056), I get the following different colors:
So I tried to manually convert the color to check if it matches any of those I color picked. I thought about bit shifting to simply compare the values
3595933029
(value in retrieved buffer) in binary : 11010110010101011001010101100101
11
followed 3 times by the 10-bit value 0101100101
, and none of the picked colors follow this (except the black corner, which would be only made of zeros though)RRRRRRRRRR GGGGGGGGGG BBBBBBBBBB AA
order (ditched bits are marked with an x) :
11010110xx01010110xx01010110xxxx
AA RRRRRRRRRR GGGGGGGGGG BBBBBBBBBB
:
xx01011001xx01011001xx01011001xx
If that can help, here's the editor window that should be captured (it really is a Third person template, didn't add anything to it except this capture code) Here's the generated bitmap when shifting bits : Code to generate bitmap's pixels data :
struct Pixel {
uint8_t blue = 0;
uint8_t green = 0;
uint8_t red = 0;
} pixel;
uint32_t* pointer = (uint32_t*)mapInfo.pData;
size_t numberOfPixels = bmpInfoHeader.width * bmpInfoHeader.height;
for (int i = 0; i < numberOfPixels; i++) {
uint32_t value = *pointer;
// Ditch the color's 2 last bits, keep the 8 first
pixel.blue = value >> 2;
pixel.green = value >> 12;
pixel.red = value >> 22;
++pointer;
fout.write((char*)&pixel, 3);
}
It somewhat seems similar in the present colors, however that doesn't look at all like the editor.
What am I missing ?
First of all, you are assuming that the mapInfo.RowPitch
is exactly StagicngTextureDesc.Width * 4
. This is often not true. When copying to/from Direct3D resources, you need to do 'row-by-row' copies. Also, allocating 2 MBytes on the stack is not good practice.
#include <cstdint>
#include <memory>
// Assumes our staging texture is 4 bytes-per-pixel
// Allocate temporary memory
auto data = std::unique_ptr<uint32_t[]>(
new uint32_t[StagingTextureDesc.Width * StagingTextureDesc.Height]);
auto src = static_cast<uint8_t*>(mapInfo.pData);
uint32_t* dest = data.get();
for(UINT y = 0; y < StagingTextureDesc.Height; ++y)
{
// Multiply by 4 as there are 4 bytes (32 bits) per pixel
memcpy(dest, src, StagingTextureDesc.Width * sizeof(uint32_t));
src += mapInfo.RowPitch;
dest += StagingTextureDesc.Width;
}
For C++11, using
std::unique_ptr
ensures the memory is eventually released automatically. You can transfer ownership of the memory to something else withuint32_t* ptr = data.release()
. See cppreference.
With C++14, the better way to write the allocation is:
auto data = std::make_unique<uint32_t[]>(StagingTextureDesc.Width * StagingTextureDesc.Height);
. This assumes you are fine with a C++ exception being thrown for out-of-memory.
If you want to return an error code for out-of-memory instead of a C++ exception, use:
auto data = std::unique_ptr<uint32_t[]>(new (std::nothrow) uint32_t[StagingTextureDesc.Width * StagingTextureDesc.Height]); if (!data) // return error
Converting 10:10:10:2 content to 8:8:8:8 content can be done efficiently on the CPU with bit-shifting.
The tricky bit is dealing with the up-scaling of the 2-bit alpha to 8-bits. For example, you want the Alpha of 11
to map to 255, not 192.
Here's a replacement for the loop above
// Assumes our staging texture is DXGI_FORMAT_R10G10B10A2_UNORM
for(UINT y = 0; y < StagingTextureDesc.Height; ++y)
{
auto sptr = reinterpret_cast<uint32_t*>(src);
for(UINT x = 0; x < StagingTextureDesc.Width; ++x)
{
uint32_t t = *(sptr++);
uint32_t r = (t & 0x000003ff) >> 2;
uint32_t g = (t & 0x000ffc00) >> 12;
uint32_t b = (t & 0x3ff00000) >> 22;
// Upscale alpha
// 11xxxxxx -> 11111111 (255)
// 10xxxxxx -> 10101010 (170)
// 01xxxxxx -> 01010101 (85)
// 00xxxxxx -> 00000000 (0)
t &= 0xc0000000;
uint32_t a = (t >> 24) | (t >> 26) | (t >> 28) | (t >> 30);
// Convert to DXGI_FORMAT_R8G8B8A8_UNORM
*(dest++) = r | (g << 8) | (b << 16) | (a << 24);
}
src += mapInfo.RowPitch;
}
Of course we can combine the shifting operations since we move them down and then back up in the previous loop. We do need to update the masks to remove the bits that are normally shifted off by the full shifts. This replaces the inner body of the loop above:
// Convert from 10:10:10:2 to 8:8:8:8
uint32_t t = *(sptr++);
uint32_t r = (t & 0x000003fc) >> 2;
uint32_t g = (t & 0x000ff000) >> 4;
uint32_t b = (t & 0x3fc00000) >> 6;
t &= 0xc0000000;
uint32_t a = t | (t >> 2) | (t >> 4) | (t >> 6);
*(dest++) = r | g | b | a;
Any time you reduce the bit-depth you will introduce error. Techniques like ordered dithering and error-diffusion dithering are commonly used in pixels conversions of this nature. These introduce a bit of noise to the image to reduce the visual impact of the lost low bits.
For examples of conversions for all
DXGI_FORMAT
types, see DirectXTex which makes use of DirectXMath for all the various packed vector types. DirectXTex also implements both 4x4 ordered dithering and Floyd-Steinberg error-diffusion dithering when reducing bit-depth.