So you don’t have the loops anymore now? I think you can use ddx and ddy instructions to replace some or maybe even all of the texture fetches.
Instead of alpha = max(alpha, sample) you can do alpha += sample. Not sure if that will have an impact, the compiler might have optimized it.
What does the shader look like right now?