Most of the expense in your shader is all the samples. Look up "distance field font rendering", if every pixel in the texture has, as lets say its alpha value, its distance in pixels from the border (the shortest distance) ... then you only have to do one sample to know if its within the specified border range.
However, to answer the question about how people tend to generally do this...
Lets say your game has a TextControl which can be any one style. So if you want this sentence "Deals 300 damage!", that is built from three text controls -- since they need to be different styles. "Deals 300 damage!" could be just one.
Each text control is its own spritebatch.drawstring call ... each style is a separate SpriteFont. You just have to emit controls in an efficient way, and it will be efficient.
You may or may not even need your own custom shader (colorized outlines might be a uniform). Anyhow there are also cool things you can do if you -do- have your own shader.