-
Notifications
You must be signed in to change notification settings - Fork 203
Description
Hello FidelityFX Team,
First, thank you for the amazing work on FSR 2. It's a fantastic technology.
I was working on getting the FSR 2 sample to run on older D3D11 hardware which lacks full support for Typed UAV Loads. I encountered a shader compile error in ffx_fsr2_compute_luminance_pyramid_pass.hlsl due to this hardware limitation.
After a deep dive and debugging, I identified two separate Load operations from typed UAVs that were causing the issue. I have managed to resolve both with shader-only changes that not only fix the compatibility problem but also appear to be a potential performance optimization.
1. The rw_auto_exposure Load Issue:
- Problem: The
SPD_LoadExposureBufferfunction is called within the pass, which reads from therw_auto_exposureUAV (R32G32_FLOAT). This is aTyped UAV Load. - Solution: The
pass.hlslfile defines the UAV binding (FSR2_BIND_UAV_AUTO_EXPOSURE) but not the SRV binding. By adding#define FSR2_BIND_SRV_AUTO_EXPOSUREto the pass file, ther_auto_exposureSRV becomes available. Then,SPD_LoadExposureBuffercan be modified to safely read from the SRV instead.
2. The rw_img_mip_5 Load Issue:
- Problem: The generic SPD algorithm calls the
SpdLoadfunction, which in turn callsSPD_LoadMipmap5. This function reads from therw_img_mip_5UAV, anotherTyped UAV Load. - Solution: The SPD algorithm already stores intermediate results in group-shared memory (
spdIntermediateR/G/B/A). By modifying theSpdLoadfunction inffx_fsr2_compute_luminance_pyramid.hto read directly from this shared memory (by callingSpdLoadIntermediate), the costly and problematic global memory access is completely avoided. This also seems to align better with the optimal data flow for a parallel reduction algorithm.
I have prepared a Gist containing the two modified files that implement these solutions. No C++ or backend changes are required.
Gist with the modified files:
https://gist.github.com/Montazeran8/bc037da2b7fb32cb3b4c08687071adb8
Summary of Benefits:
- Compatibility: Completely resolves the
Typed UAV Loadissue, allowing the luminance pyramid pass to compile and run on older D3D11 hardware. - Performance: The
SpdLoadmodification, in particular, avoids a recurring global memory access inside the SPD reduction loop and uses the much faster group-shared memory (LDS). This should be a performance benefit on all hardware, not just older cards.
Thank you for your time and for considering this feedback. I hope this suggestion is helpful!