Skip to content

Suggestion: Improve Performance and D3D11 Compatibility in Luminance Pyramid Pass #137

@Montazeran8

Description

@Montazeran8

Hello FidelityFX Team,

First, thank you for the amazing work on FSR 2. It's a fantastic technology.

I was working on getting the FSR 2 sample to run on older D3D11 hardware which lacks full support for Typed UAV Loads. I encountered a shader compile error in ffx_fsr2_compute_luminance_pyramid_pass.hlsl due to this hardware limitation.

After a deep dive and debugging, I identified two separate Load operations from typed UAVs that were causing the issue. I have managed to resolve both with shader-only changes that not only fix the compatibility problem but also appear to be a potential performance optimization.

1. The rw_auto_exposure Load Issue:

  • Problem: The SPD_LoadExposureBuffer function is called within the pass, which reads from the rw_auto_exposure UAV (R32G32_FLOAT). This is a Typed UAV Load.
  • Solution: The pass.hlsl file defines the UAV binding (FSR2_BIND_UAV_AUTO_EXPOSURE) but not the SRV binding. By adding #define FSR2_BIND_SRV_AUTO_EXPOSURE to the pass file, the r_auto_exposure SRV becomes available. Then, SPD_LoadExposureBuffer can be modified to safely read from the SRV instead.

2. The rw_img_mip_5 Load Issue:

  • Problem: The generic SPD algorithm calls the SpdLoad function, which in turn calls SPD_LoadMipmap5. This function reads from the rw_img_mip_5 UAV, another Typed UAV Load.
  • Solution: The SPD algorithm already stores intermediate results in group-shared memory (spdIntermediateR/G/B/A). By modifying the SpdLoad function in ffx_fsr2_compute_luminance_pyramid.h to read directly from this shared memory (by calling SpdLoadIntermediate), the costly and problematic global memory access is completely avoided. This also seems to align better with the optimal data flow for a parallel reduction algorithm.

I have prepared a Gist containing the two modified files that implement these solutions. No C++ or backend changes are required.

Gist with the modified files:
https://gist.github.com/Montazeran8/bc037da2b7fb32cb3b4c08687071adb8

Summary of Benefits:

  • Compatibility: Completely resolves the Typed UAV Load issue, allowing the luminance pyramid pass to compile and run on older D3D11 hardware.
  • Performance: The SpdLoad modification, in particular, avoids a recurring global memory access inside the SPD reduction loop and uses the much faster group-shared memory (LDS). This should be a performance benefit on all hardware, not just older cards.

Thank you for your time and for considering this feedback. I hope this suggestion is helpful!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions