All in one shader. Unless you start doing some crazy squiggles or something I can’t imagine it mattering.
The extreme case of flow-control is a bytecode interpreter on the GPU. At that extreme it’s still just an order of magnitude slower than a cooked shader for a specific purpose.
From the Destiny talks (Destiny Particle Architecture) you can see that hold up in their tables. CPU bytecode was pushing ~1,500, while GPU bytecode ~15,000, and a cooked shader ~150,000.
Whatever you write it’s probably going to be far less severe than a bytecode interpreter in a shader, so the question is how big of a fraction of an order of mag. do you really care about?