hlsl shader optimization

Hi

I’m new to hlsl shaders and I recently made a pixel shader for a pixel font. I first draw the game text to a full screen rendertarget and then apply the pixel shader. It looks like this:

before

after

The shader is fairly heavy because of the multiple outlines (still a bit better than drawing the text 30+ times with drawstring to recreate the same effect) , so I’m curious if I can optimize this shader to be faster.

This is the shader: (edit)

#if OPENGL
#define SV_POSITION POSITION
#define VS_SHADERMODEL vs_3_0
#define PS_SHADERMODEL ps_3_0
#else
#define VS_SHADERMODEL vs_4_0_level_9_1
#define PS_SHADERMODEL ps_4_0_level_9_1
#endif
float2 texelSize;
int shadow;
Texture2D SpriteTexture;
sampler2D SpriteTextureSampler = sampler_state
{
  Texture = <SpriteTexture>;
};
struct VertexShaderOutput
{
  float4 Color : COLOR0;
  float2 TextureCoordinates : TEXCOORD0;
};
float4 MainPS(VertexShaderOutput input) : COLOR
{
  float4 color = tex2D(SpriteTextureSampler, input.TextureCoordinates) * input.Color;
  if (!color.a)
  {
  float2 offsetX = float2(texelSize.x, 0);
  float2 offsetY = float2(0, texelSize.y);
  float alpha = color.a;
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY + offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY + offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY - offsetX).a);
  if (alpha)
  {
  	return float4(1, 1, 1, alpha);
  }
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetX + offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetX - offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY + offsetY).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY - offsetX - offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetX - offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY + offsetY - offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY + offsetY + offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY - offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY + offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY + offsetX + offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY + offsetX + offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY - offsetX - offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY + offsetY + offsetX + offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY + offsetX + offsetX).a);
  alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY + offsetY - offsetX - offsetX).a);
  if (alpha)
  {
  	return float4(0.18f, 0.18f, 0.18f, alpha);
  }
  if (shadow == 1) 
  {
  	alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetX - offsetX - offsetX).a);
  	alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY - offsetY).a);
  	alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates + offsetY - offsetX - offsetX - offsetX).a);
  	alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetX - offsetX - offsetX).a);
  	alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY - offsetY - offsetX).a);
  	alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY - offsetY + offsetX).a);
  	alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY - offsetX - offsetX - offsetX).a);
  	alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY - offsetY - offsetX - offsetX).a);
  	alpha = max(alpha, tex2D(SpriteTextureSampler, input.TextureCoordinates - offsetY - offsetY - offsetY - offsetX - offsetX - offsetX).a);
  	if (alpha)
  	{
  		return float4(0.09f, 0.09f, 0.09f, alpha);
  	}
  }

}

return color;
}

technique SpriteDrawing
{
  pass P0
  {
      PixelShader = compile PS_SHADERMODEL MainPS();
  }
};

I’m simply testing if a pixel is in a certain range of an outline and give it the corresponding color. One pixel in the shader does not correspond to one pixel in the font. That’s why I have the for loops.

Thanks in advance.

I noticed your outline width is 4; however, the overall visual effect is that it looks like a single pixel. If your text (and graphics) are all scaled up to look that way, you might consider rendering unscaled to a render target the quarter of the size of your full screen. This will drastically reduce the amount of pixels your shader has to process and will also allow you to set your outline width to 1.

Sorry if I misread how this is working :slight_smile:

1 Like

Might be a good idea, thanks!
I’ll try this asap, and post the fps improvement with drawstring() in the other thread.

Edit: Can I somehow apply the shader before scaling? The effect is still the same if draw the small rendertarget (scale = 4) in a spritebatch using the shader as a parameter. My current code is something like this:

        Assets.Graphics.GraphicsDevice.SetRenderTarget(rt);
        Assets.Graphics.GraphicsDevice.Clear(Color.Transparent);
        Assets.SpriteBatch.Begin(SpriteSortMode.Deferred, BlendState.AlphaBlend, SamplerState.PointClamp, DepthStencilState.Default, RasterizerState.CullCounterClockwise, null);
        // Write text to rendertarget
        Assets.SpriteBatch.End();
        Assets.Graphics.GraphicsDevice.SetRenderTarget(null);
        Assets.Graphics.GraphicsDevice.Clear(Color.CornflowerBlue);
        // Other stuff, world drawing
        Assets.SpriteBatch.Begin(SpriteSortMode.Immediate, BlendState.AlphaBlend, SamplerState.PointClamp, DepthStencilState.Default, RasterizerState.CullCounterClockwise, fontOutlineShader); // Apply shader
        Assets.SpriteBatch.Draw(rt, Vector2.Zero, Color.White); // Draw rendertarget (or draw a downscaled version at scale = 4)
        Assets.SpriteBatch.End();
        // UI drawing (mouse etc)

Oh yea, good point. You might just have to do a two-pass to render target.

So, write to rendertarget -> apply shader & overwrite rendertarget with new data -> draw rendertarget?

Is this efficient? I’m curious about the performance impact of all those spritebatch.Begin calls.

I’m not sure. You’ve got a really good setup to test though :smiley:

It should only be three total, so that shouldn’t be too bad.

First = Draw base assets to target.
Second = Post processing pass on target.
Third = Draw target to screen scaled up.

1 Like

Thanks for the fast reply! I’ll test the performance when I can :stuck_out_tongue:.
Yeah, three for the text, but two more for UI (no effects, no transformation) and world (camera transformation) rendering.

Edit: I can draw the two-pass rendertarget together with the UI.

Hmm, weird. I still need to use outlineWidth = 4 to get the desired effect for the font if I downscale my rendertarget and text. I’m still new to rendertargets so I might be doing something wrong here.

Rendertarget draw if I clear the graphicsdevice with a white color: (all text is now drawn at scale 0.25f)

Rendertarget draw with the applied shader (outlineWidth = 4)

Rendertarget draw with the applied shader (outlineWidth = 1)

That’s odd… you did calculate the new value of texelSize for your smaller RenderTarget and pass that through to your shader, right?

(By the way, that sand looking tile… can that be rendered with the text in the down-scaled render target? It looks like it has the exact same effect on it.)

Thanks again :slight_smile: ! Forgot to change the texelsize…
The outlines on the sand texture are the same but the UI is temporary.
The shader performance seems about the same as before, probably because the extra spritebatch begin calls?

Edit: I’ll probably have some performance improvements against a fullscreen render target on very big resolutions.

Hmm, I’m surprised an additional sprite batch call brought you back to your old performance where you said you had 30+ :frowning:

Not 30+ spritebatch begin calls, 30+ spritebatch.drawstring calls per string drawn on screen.
I mean the old shader method (fullscreen render target). The shader is much faster than the drawstring method because each new string needs 30+ additional drawstring calls. The shader performance stays the same with more strings.

Oh I see. Interesting.

1 Like

So you don’t have the loops anymore now? I think you can use ddx and ddy instructions to replace some or maybe even all of the texture fetches.

Instead of alpha = max(alpha, sample) you can do alpha += sample. Not sure if that will have an impact, the compiler might have optimized it.

What does the shader look like right now?

Nvm, this won’t work. You can use ddx/ddy to get the texelsize so you don’t have to pass it into the shader.

I’ve updated the original post with my current shader.
There are no loops anymore but there’s an additional boolean value for the font shadow. How would I use ddx and ddy to get the texelsize?

As a recommendation, you might consider passing in each colour you want to set as a parameter. Then you don’t have to change the shader if you want to change colours.

1 Like

Very true. I’m currently still searching for an efficient way to draw multiple strings in different styles (ex.: some have a shadow and some don’t) without creating different render targets for each group of strings with a certain style. I don’t know if this is possible.

I’m not sure if that’s possible either, unless you bake the effect into the font itself. If they were always going to be the same colour this wouldn’t be a problem, but I remember your screenshots showing the inner colour being different.

I think the best you can do here is to just batch styles and render all text of each style to a render target, post process the effect, then layer them all together.

That said, with the graphics you’ve shown thus far (quite pixelated), and using down-scaled render targets, modern computers should be able to handle a lot without significant performance losses. If it does, you might just have to rethink how many text effects you want, or limit the colours for some of the effects and bake those visuals into sprites so you can avoid post processing?

1 Like

Thank you for the advice.

That was the drawstring() operation that is executed once per string before applying the outline shader. The main two are just the default outlines and a version without the outer outline when the text is being used as a button.