Search code examples
gpudirectx-12

Transferring data from a compute pipeline's output resource to swapchain image in DirectX 12?


I am faced with the following scenario:

0) I have a compute pipeline which produces output which I would like to be copied verbatim into render targets exposed by the swap chain.

1) In DirectX 11, the compute pipeline could have written directly into a render target exposed by the swapchain, but one cannot do this in DirectX 12 (see discussion here: D3D12 Use backbuffer surface as unordered access view (UAV))

2) Therefore, my compute pipeline will have to write to an output resource X, which is not a render target exposed by the swapchain.

Question: what is the best/easiest way to transfer data from X, to a render target exposed by a swapchain?

There is only one possible solution I am aware of: have a "dummy" graphics pipeline which does nothing apart from take data from X and write it into a render target.


Solution

  • Best/easiest solution would be to create a ID3D12Resource identical to swapchain back-buffer and use it in your compute pipeline(as you said resource X); and then call ID3D12GraphicsCommandList::CopyResource method to transfer compute shader output; example code attached.
    By using CopyResource you avoid any complications and allow system/drivers to check if resources are compatible for copying(according to debug layer - source and destination texture resource must have equivalent dimensions, including width, height, depth, mip levels, and array size; and format)

    Here is complete example.

    your code here...
    // Swap-chain creation
    ComPtr<IDXGISwapChain4> swapChain;
    
    const DXGI_SWAP_CHAIN_DESC1 swapChainDesc = {
        width,
        height,
        DXGI_FORMAT_R8G8B8A8_UNORM,
        FALSE,
        {1, 0},
        DXGI_USAGE_BACK_BUFFER,
        bufferCount, 
        DXGI_SCALING_STRETCH,
        DXGI_SWAP_EFFECT_FLIP_DISCARD,
        DXGI_ALPHA_MODE_UNSPECIFIED,
        0
    };
    ComPtr<IDXGISwapChain1> swapchain1;
    ThrowIfFailed( dxgiFactory->CreateSwapChainForHwnd(
        directCommandQueue.Get(),
        hWnd,
        &swapChainDesc,
        nullptr,
        nullptr,
        &swapchain1));
    ThrowIfFailed( swapchain1.As(&swapChain));
    
    // Retrieving swapchain buffers
    ComPtr<ID3D12Resource> backBuffers[bufferCount];
    for (int i = 0; i < bufferCount; i++)
    {
        ThrowIfFailed( swapChain->GetBuffer(i, IID_PPV_ARGS(&backBuffers[i])));
    }
    
    // Declare resource
    ComPtr<ID3D12Resource> drawResource;
    // Get swapchain description by value
    auto surfaceDesc = backBuffers[0]->GetDesc();
    // Change description to allow UAV
    surfaceDesc.Flags |= D3D12_RESOURCE_FLAG_ALLOW_UNORDERED_ACCESS;
    /* Create committed resource, probably can do better with own heap?
     Could've used swapchain heap flags and properties but these seem to be incompatible/special, might depend on system */ 
    CD3DX12_HEAP_PROPERTIES drawHeapProps{ D3D12_HEAP_TYPE_DEFAULT };
    ThrowIfFailed( device->CreateCommittedResource(
        &drawHeapProps, D3D12_HEAP_FLAG_NONE,
        &surfaceDesc, D3D12_RESOURCE_STATE_UNORDERED_ACCESS,
        nullptr, IID_PPV_ARGS(&drawResource)));
    
    // Create UAV using some handle in descriptor heap, first handle as example
    D3D12_CPU_DESCRIPTOR_HANDLE uavHandle = descriptorHeap->GetCPUDescriptorHandleForHeapStart();
    
    // Create UAV description
    D3D12_TEX2D_UAV index = {};
    index.MipSlice = 0; index.PlaneSlice = 0;
    
    D3D12_UNORDERED_ACCESS_VIEW_DESC uavViewDesc = {};
    uavViewDesc.Format = drawResource->GetDesc().Format;
    uavViewDesc.ViewDimension = D3D12_UAV_DIMENSION_TEXTURE2D;
    uavViewDesc.Texture2D = index;
    // Create UAV itself
    device->CreateUnorderedAccessView(drawResource.Get(), nullptr, &uavViewDesc, uavHandle);
    
    ...
    
    // Inside your loop
    
    UINT currBackBuffId = swapChain->GetCurrentBackBufferIndex();
    auto& backBuffer = backBuffers[currBackBuffId];
    ...
    commandList->Dispatch(...);
    ...
    
    // Transition backbuffer and draw resource to copy compatible states
    const auto preCopyUav = CD3DX12_RESOURCE_BARRIER::Transition(
        drawResource.Get(),
        D3D12_RESOURCE_STATE_UNORDERED_ACCESS, D3D12_RESOURCE_STATE_COPY_SOURCE);
    commandList->ResourceBarrier(1, &preCopyUav);
    
    
    const auto preCopyBackbuffer = CD3DX12_RESOURCE_BARRIER::Transition(
        backBuffer.Get(),
        D3D12_RESOURCE_STATE_PRESENT, D3D12_RESOURCE_STATE_COPY_DEST);
    commandList->ResourceBarrier(1, &preCopyBackbuffer);
    
    // Copy itself
    commandList->CopyResource(backBuffer.Get(), drawResource.Get());
    
    
    // Transition back to present and uav states
    const auto postCopyUav = CD3DX12_RESOURCE_BARRIER::Transition(
        drawResource.Get(),
        D3D12_RESOURCE_STATE_COPY_SOURCE, D3D12_RESOURCE_STATE_UNORDERED_ACCESS);
    commandList->ResourceBarrier(1, &postCopyUav);
    
    
    const auto postCopyBackbuffer = CD3DX12_RESOURCE_BARRIER::Transition(
        backBuffer.Get(),
        D3D12_RESOURCE_STATE_COPY_DEST, D3D12_RESOURCE_STATE_PRESENT);
    commandList->ResourceBarrier(1, &postCopyBackbuffer);
    

    Also it seems to be unnecessary to create UAV description, pass nullptr instead of desc.

    I would like to speculate why it works and probably is safe/reliable to use CopyResource in this case - swapchain backbuffers/surfaces is same as ordinary ID3D12Resource's but so called "presentation engine" borrows one of your ID3D12Resource's to present on the screen, and these resources are created automatically by the swapchain with heap flag D3D12_HEAP_FLAG_ALLOW_DISPLAY enabled, sadly there is not a lot of information about that(even outdated), this is just my observations(which seems to be true according to graphics debuggers like nvidia nsight etc).

    And you can copy to swapchain backbuffers using only command queue you passed when creating swapchain, there is connection between swapchain and command queue to sync present calls automatically etc(and this command queue can only be DIRECT command queue because otherwise swapchain creation will fail)

    Performance should be nearly(if not fully) identical to writing to swapchain directly, on most of the hardware.(I guess on some older drivers/hardware you may have to use double buffering or multiple queues to avoid pipeline stalls caused by copy operations, but i never seen this even on very old hardware like gt750m/gts450 etc)