I wrote couple of effects as an experiment that has some if-else statements in them. I remember reading in the past that this is really bad for performance and if-else should be avoided.
My question is, before I butcher my effects and invest 1-2 hours in splitting them into techniques - does it really worth it?
Should I invest the time in breaking it into lots of different techniques and select the right one, or is it something that is no longer relevant and if-else are safe to use?
Also is there a performance difference between if-else and using the question mark operator?
I can’t speak for the latter difference with regards to shaders (though I’d imagine not?), but how many conditional branches are you working with? While the GPU isn’t as good at branching as the CPU, modern GPUs support them, and there shouldn’t be issues if your shaders have a few of them.
If you’re targeting lower-end hardware, then they may be a problem. Test out what you’ve got so far and see if there are any performance issues. If you do, and they trace down to the conditionals in your shaders, then that would be the time to split them into techniques.
If statements are expensive if they require real dynamic branching. Often (probably most of the time) the compiler can solve an if statement without using a dynamic branch.
I think there are three main categories for the compiler to treat an if-statement (possibly an oversimplification)
The if statement gets completely removed, because the condition can be solved before the shader is run. This is true for conditions that only depend on constant shader parameters.
The if statement gets removed by flattening the branches. This usually (not always) means that both branches are executed, but the false-branch gets masked out. This is a good method for short branches.
The if statement leads to a real dynamic branch. This used to be quite expensive, I’m not sure if it’s still so bad on modern GPU’s.
Sounds like your if statements fall into the first category, in which case they are basically free.
No, the question mark operator is just syntactic sugar.
Thanks for the replies and links, all my if statements are indeed based on effect parameters I set prior to drawing, so it should be treated as “static branching” and should be OK.
@KakCAT 's link is still correct, though it omits that dynamic branch costs are a per-execution-block (wavefront / w|e your vendor calls them) with no work-item (be it a vertex or fragment) being able to do less work (they really just sit & spin) than any other.
The short “I don’t care why” summary is: imbalanced branches are bad, if one side of your if-else is a ton of instructions then your branch is pretty much not a branch anymore.
sorry for hijacking the thread but just now I’m with a perfect hands-on example question.
I’m trying to implement MSDF for my fonts. I also want to add borders and shadows to the fonts, but not to all of them. I’d like to reduce the number of drawcalls
So my idea is using a shader which has two branches, the “plain text branch” (shader of https://github.com/Chlumsky/msdfgen , scroll to the bottom) and the “text with borders and shadows branch” ( https://gist.github.com/Chlumsky/263c960ae0a7df59afc2da4051eb0553 , renderMsdf function)
All the characters will be put into a kind of spritebatch, each character will have its TexCoord, border, thickness, shadow opacity. If border is -1, in example, it will select the plain text shader. Full blown text shader otherwise.
Characters will probably all have about the same size, let’s say around 32x32 (will vary per character, of course)
So, in theory, what would be a better solution?
- Go with the single shader for multiple text types
- Just do two drawcalls
I know the best solution is “just try it” ( actually what I’ll do ^^ ) but with so many hardware and only being able to test in 1 GPU, the test only provides feedback in your current HW.
My bet of an uneducated GPU programmer is that as the characters are ‘quite big’ (a lot bigger than a warp/wavefront) and the text characteristics will be laid out linearly (I mean, I’ll write i.e. 10 characters ‘plain’, then 25 decorated characters, then 12 character plain again, …) the impact of the jump should be low, but I’d love for somebody much more versed in GPU programming to confirm or refute it @AcidFaucent
All in one shader. Unless you start doing some crazy squiggles or something I can’t imagine it mattering.
The extreme case of flow-control is a bytecode interpreter on the GPU. At that extreme it’s still just an order of magnitude slower than a cooked shader for a specific purpose.
From the Destiny talks (Destiny Particle Architecture) you can see that hold up in their tables. CPU bytecode was pushing ~1,500, while GPU bytecode ~15,000, and a cooked shader ~150,000.
Whatever you write it’s probably going to be far less severe than a bytecode interpreter in a shader, so the question is how big of a fraction of an order of mag. do you really care about?
Yesterday I found a very enlightening document that made me undestand a little bit more of how shaders work and why some branches are more expensive than others.
If I understood it right ( for the graphics layman like me XD ) there are two kinds of registers where shader variables can go into (decided in compile time): Scalar registers (all values of this variable for a work item are the same) and Vector registers (values for a work item can vary). When the condition is performed with scalar registers/values, the jump is very cheap, because it IS a jump. If the condition is performed with vector registers both sides of the jump must be run.
So a first hint to know if a jump is going to be expensive or not is looking at the condition and try to figure if it’s scalar or vector. (of course there will be other factors, but this one is very easy to look at)
The document seems to be just for GCN, but I suppose the concept translates to other architectures too.
There is another thing to consider, the wavefront.
Basically when you run a shader on a bunch of pixels, the hardware operates on N pixels at the same time. N is dependant on the graphics card, and pretty obviously the larger the better.
The one thing a lot of people forget, is all these pixels start rendering at the same time, and the hardware won’t move on until all the pixels have been processed. So the entire batch takes as long as the SLOWEST pixel.
So adding if statements can result in different pixels taking different times to complete.
Normally this isn’t a problem, but I have seen cases where it totally destroys the wavefront. In one case a programmer “optomised” the shaders and slowed down the GPU by 38%.
Remember you can always use floats to emulate if statements. You calculate both values that were in the if statement and multiply them by a pair of floats with one having a value of 0 and another a value of 1
Just a note that you should need to do this manually. You can just attribute your if-statement with