Can anyone provide any advice for optimizing a multi-pass terrain shader?

I’m not quite sure where to start. I know exactly where the GPU time is going: too many 1024x1024 tex2d calls. What I don’t know is how to fix it, and I could use some more eyes on the problem.

The situation: my terrain has 24 biomes, which move around the terrain as the game progresses.

I’m rendering this in 6 passes. Each pass applies:

  • a low-res global normal map,
  • the passes biome blend map,
  • 4 biome-specific height maps,
  • 4 biome-specific normal maps,
  • 4 biome-specific colour maps.

That’s a rather nasty 14*6 84 tex2d calls, for geometry that usually takes up the entire screen due to the game’s RTS-style top down viewpoint. My GPU isn’t happy with me.

Intuitively, I want to do a deferred render approach: render a full screen texture with screen-space depth, normal and texture coordinates, then draw the biomes on top of that. That way I only draw the terrain geometry once instead of 6 times. Problem is, that makes no difference to the amount of work the pixel shader has to do, and the terrain didn’t cause any FPS problems when I was only drawing the diffuse maps, which indicates to me that it’s the pixel shader that’s the bottleneck.

Right now, the concessions I have to pixel shader performance are fairly basic:

  1. Dumping out of the pixel shader as soon as the blend map indicates there are no relevant biomes on this pixel.
  2. Packing the height map into the normal map’s alpha channel. I’m pretty sure this is actively hurting me because I have to sample height and then translate the texture coords by the relief map before I sample the normal.

As I said, I could use a few more sets of eyes on this, to make sure I’m not missing something obvious. How would you go about using extra passes, different textures or any other tricks to optimize in a situation like this?

This is a great question that I don’t have the knowledge to help with but you may be able to find someone that can help you in the MollyRocket or HandmadeNetwork discord.

Break it into tiles of reasonable sizes, create permutation shader (in Dx12 or any modern API I would tell you to use dynamic texture indexing but this aint gonna fly here), consider only sub set of biome textures for given tile through permutation shader. Even then still BRANCH (and by branch I mean branch that wont flatten) to prevent additional texture samples where blending between biomes isn’t required. That being said, post some pictures, so we can optimized towards desired result, text wont mostly cut it.

Also not sure why you mean by pass here, this should be single drawcall in your current setup. Passes as seen by MG (or FX system in general) is simply binding different shader and drawing again.