Deferred Physically Based Rendering - Storing four 8-bit values in one 32-bit value using HLSL

LeeHesselden · August 25, 2020, 12:15pm

I’m working on a deferred PBR rendering engine that uses the 128-bit surface format (SurfaceFormat.Vector4) to perform rendering in a single geometry pass (excluding shadows). I’m storing colour in R (float3), Normal in G (float3), Material in B (float3) and finally Depth in A (32-bit float). This works except I’m only storing a float3 in the R, G and B channel and so losing a single byte per channel on each pixel (3 bytes lost per pixel as Depth in A is a simple 32-bit float so no loss). I used the following HLSL to push the data into each channel.

#define f3_f(c) (dot(round((c) * 255), float3(65536, 256, 1)))
#define f_f3(f) (frac((f) / float3(16777216, 65536, 256)))

This wasn’t a problem initially as I only had material properties for Roughness, Metalness and Emission in the game so could reluctantly accept the lost bytes. However, I’m adding ambient occlusion and could possibly use the other potentially available bytes on colour and normal for other properties. I’ve managed to create a nasty version, below, that crams emission and occlusion in the third byte but this only gives me numeric range 16 for each which looks okay but still wastes data. The current code for the material layer is below.

inline float4 f_f4(float v)
{
   
    float3 f3 = f_f3(v);  
    uint b, a, c;
    
    c = uint(f3.b * 255.0f);    
    b = (c & 0xF0) >> 4;
    a = c & 0x0F;
    
    return float4(f3.r, f3.g, float(b) / 15.0f, float(a) / 15.0f);
}

inline  float f4_f(float4 color)
{      
    uint b, a, c;
    
    b = (uint) (color.b * 15);
    b = b << 4;
    a = (uint) (color.a * 15);
    c = a | b;
    
    return f3_f(float3(color.r, color.g, float(c) / 255.0f));
}

float4 HFXDeferedDataPixelShaderFunction(HFXDeferedDataVertexShaderOutput input) : COLOR0
{

    float3 f3Normal;
    float3 f3Color;
    float4 f4Material;
    float4 f4Final;
	
	// Get normal from sampler and combine with world normal
    f3Normal = NormalMapConstant * (NormalMap.Sample(NormalSampler, input.TextureCoordinate).xyz - float3(0.5, 0.5, 0.5));
    f3Normal = input.Normal + (f3Normal.x * input.Tangent + f3Normal.y * input.Binormal);
    f3Normal = normalize(f3Normal);
    f3Normal = 0.5f * (f3Normal + 1.0f);
	
	// Get texture color from sampler
    f3Color = tex2D(TextureSampler, input.TextureCoordinate).xyz;
    
    // Get material properties from sampler
    f4Material = tex2D(MaterialSampler, input.TextureCoordinate);
	
	// Encode data
    f4Final = float4(f3_f(f3Color),
				     f3_f(f3Normal),
					 f4_f(f4Material),
					 1.0f - (input.Depth.x / input.Depth.y));

	// Return final 
    return f4Final;

}

Now from all my reading this wasn’t easy in earlier shader models and I’ve seen many ways of attempting it, none that worked for me, and some people saying it isn’t possible. However, in shader model 4 there are two functions “asfloat” and “asunit” that should be able to do exactly this. However, despite trying a LOT of different things I can’t seem to get this to work. This only needs to work on Windows using Directx. Has anyone out there been able to do this and could provide a f4_f and f_f4 function as above without losing the last byte?

TLDR: Need a float (32bit) to float4 (8bit x 4) converter method for HLSL in both directions. Windows Directx.

Stainless · August 25, 2020, 3:21pm

You do know that you can bind multiple rendertargets in a single draw call don’t you?

RenderTargetBinding[] GBufferTargets;
GBufferTargets = new RenderTargetBinding[4];
GBufferTargets[0] = new RenderTargetBinding(colorRT);
GBufferTargets[1] = new RenderTargetBinding(normalRT);
GBufferTargets[2] = new RenderTargetBinding(depthRT);
GBufferTargets[3] = new RenderTargetBinding(materialRT);
device.SetRenderTargets(GBufferTargets);

Then you just define a struct for the pixel shader output

struct PixelShaderOutput
{
	float4 Color		: COLOR0;
 	float4 Normal		: COLOR1;
    float  Depth		: COLOR2;
    float4 Material		: COLOR3;
};

Then you can leverage the GPU’s hardware to do the conversions you are worried about.

I know this is not answering your question directly, but it seems a lot simpler to me to do it this way, and I would think it’s going to be faster as well.

Is there any particular reason you want to use a single vector4 render target?

LeeHesselden · August 25, 2020, 4:07pm

Hi, thank you for replying, very much appreciated.

I didn’t. I had separate render targets to start with though I had multiple geometry passes rather than using the method you suggest which does look like a cleaner way of solving the problem.I’m converting from a forward rendered to deferred to try and improve performance and learning as I go.

I had thought this way would perform well as it reduces the number of samples in the deferred combine and in the water shader which samples colour for alpha, depth for wash effect and normal for refraction.

Do you think the additional packing outweighs the cost of the additional sampling and passing multiple textures in combine? For example to get all data at the moment I do one sample and then unpack as below.

	// Pull defered data from sampler
    f4DeferedData = DeferedDataLayer.SampleLevel(DeferedDataLayerSampler, f2TextureCoordinate, 0);
	
	// Pull data
    f3Color = f_f3(f4DeferedData.x);
    f3Normal = f_f3(f4DeferedData.y);
    f3Normal = 2.0f * f3Normal - 1.0f;
    f4Material = f_f4(f4DeferedData.z);
    fDepth = 1 - f4DeferedData.a;
	
	// Convert depth to world position
    f2ScreenCoordinate = float2(f2TextureCoordinate.x * 2 - 1, (1 - f2TextureCoordinate.y) * 2 - 1);
    f4Position = mul(float4(f2ScreenCoordinate.x, f2ScreenCoordinate.y, fDepth, 1.0f), InverseViewProjection);
    f4Position /= f4Position.w;

Charles_Humphrey · August 25, 2020, 8:21pm

Of you want to take a look at my deferred lighting sample on git hub, it uses multiple rts.

LeeHesselden · August 26, 2020, 8:23am

Thank you for the replies.

I’m going to look further into the other Monogame deferred projects mentioned and see what ideas I can use. I have now managed to get the originally requested code to work correctly without the byte loss. It was a simple maths error where I had not been unpacking the channels correctly expecting 0 to 255 instead of 0 to 1. It may be of use to someone when packing data for some other purpose, if not the purpose I originally intended, as such I present it below. It could likely be optimised further but it works correctly with Monogame 3.8 as it stands.

inline float4 f_f4(float fValue)
{
    
    uint uiValue = asuint(fValue);
    float4 f4Value;
    
    f4Value.r = ((uiValue & 0xFF000000) >> 24) / 255.0f;
    f4Value.g = ((uiValue & 0x00FF0000) >> 16) / 255.0f;
    f4Value.b = ((uiValue & 0x0000FF00) >> 8) / 255.0f;
    f4Value.a = ((uiValue & 0x000000FF)) / 255.0f;
    
    return f4Value;

}

inline float f4_f(float4 f4Value)
{
    uint r, g, b, a;
    float fValue;
    
    r = uint(f4Value.r * 255.0f) << 24;
    g = uint(f4Value.g * 255.0f) << 16;
    b = uint(f4Value.b * 255.0f) << 8;
    a = uint(f4Value.a * 255.0f);
    fValue = asfloat(r | g | b | a);
    
    return fValue;
}

Kwyrky · August 26, 2020, 8:59am

I think this is the right thread where you can find kosmonauts deferred renderer which also did use PBR. I’m sure you can find some good examples in there for your project.

roy-t · August 27, 2020, 6:42pm

Heya, just to chime in. I’ve also got a deferred renderer (though not PBR) and I also use the multiple render targets approach. See: https://github.com/roy-t/MiniRTS