How can I make this 2D light shader run faster?

tval · July 15, 2020, 8:05pm

I am quite a nub when it comes to shaders, so any help is appreciated. I have a 2D light shader that struggles with performance when it comes to the ‘for’ loop for processing the multiple light inputs. I have read over some of the light shading techniques for multiple lights (such as deferred light shading), but have a hard time understanding them or making them work properly. Do I need to scrap this, or is there a way I can make this run better without losing a ton of FPS.

#if OPENGL
#define SV_POSITION POSITION
#define VS_SHADERMODEL vs_3_0
#define PS_SHADERMODEL ps_3_0
#else
#define VS_SHADERMODEL vs_4_0_level_9_1
#define PS_SHADERMODEL ps_4_0_level_9_1
#endif

Texture2D ScreenTexture : register(t0);

sampler TextureSampler : register(s0)
{
	Texture = (ScreenTexture);
};

struct VertexShaderOutput
{
	float4 Position : SV_POSITION;
	float4 Color : COLOR0;
	float2 TextureCoordinates : TEXCOORD0;
};

float LightRadius = 20.0;
float2 VirtualSize = float2(800, 480);
float3 Lights[64];
uint NumberOfLights = 0;
float4 Intensity = 1.5;
float3 Ambience = float3(0.8, 0.6, 0.7);

float4 MainPS(VertexShaderOutput input) : COLOR
{
	//Look up the texture value
	float4 tex = tex2D(TextureSampler, input.TextureCoordinates);

	//Convert the TextureCoordinate into pixel coordinates
	float2 pixelLocation = VirtualSize * input.TextureCoordinates;

	//The final light level after calculations are complete
	float result = 0.0;

	float currentResult;

	for (uint i = 0; i < 64; i++)
	{
		if (i < NumberOfLights && result <= 1.0)
		{
			//r and g are the x and y coordinates of the point light
			//b is the radius of the light
			currentResult = Lights[i].b - distance(pixelLocation, float2(Lights[i].r, Lights[i].g));
			currentResult /= Lights[i].b;
			if (currentResult > 0.0) result += currentResult;
		}
	}

	//Don't allow light level to go below .05
	if (result < 0.05) result = 0.05;

	input.Color.rbg *= Ambience + result * Intensity;

	return input.Color * tex;
}


technique SpriteDrawing
{
	pass P0
	{
		PixelShader = compile PS_SHADERMODEL MainPS();
	}
};

Trinith · July 15, 2020, 9:20pm

I suspect others will have a lot better suggestions than I will, but one thing that jumps out at me is that you look like you’re supporting up to 64 lights. It seems unreasonable that any given object will be within range of more than a handful of lights at any given time, so perhaps one optimization could be for you to do that check outside the shader at the object level, then only pass in lights that are actually going to light your object.

Then you can probably make use of something like a Quad Tree to make object lookup much more efficient, collecting those lights that are within some range of your object.

nkast · July 15, 2020, 9:26pm

Start by breaking down some math methods to simple form,
distance(a,b) <=> length(a-b) <=> sqrt(dot(a-b,a-b))

you can then move down the expensive sqrt, after the distance check , because (a>b) <=> ((a^2)>(b^2))

If you can spare one more float for each light you can precalculate the reciprocal of b, and replace the expensive div with a the less expensive mul. (a / b) <=> (a * (1/b))

You can also move the checks inside the for statement and avoid unnecessary loops (unless the HLSL doesn’t like that).

you can get something like that, i did not try to compile this code, it might contain bugs…

int numOfLights = NumberOfLights;
if (numOfLights>64) 
    numOfLights = 64;

for (uint i = 0; i < numOfLights && result <= 1.0; i++)
{
		//b is the radius of the light
		//a is the reciprocal of light radius ( a = (1/b) )
		float2 lightDist2 = pixelLocation - Lights[i].xy;
		float sqLightDist = dot(lightDist2, lightDist2);			
		if ( (Lights[i].b * Lights[i].b) > sqLightDist )
		{
		  currentResult = Lights[i].b - sqrt(sqLightDist);
		  currentResult = currentResult * Lights[i].a;
		  result += currentResult;
		}
}

tval · July 15, 2020, 9:39pm

Thanks for the suggestions. I will try them out. The reason ‘NumberOfLights’ is not directly in the for loop, but is instead in a condition is because I kept getting compilation errors on Android device stating that there was no known method for ‘<’ with an int on the left and a float on the right. I don’t know why i had that error because NumberOfLights is not declared as float. It did not help explicitly casting it as int either (like below).

@Trinith. I was not sure how to reduce the number of lights. Some of the lights are only a tile’s width/height. So each tile would need a light. Therefore, would I could have 144 lights on the screen. Would I not have to check every pixel against each of them?

for (uint i = 0; i < (int)numOfLights && result <= 1.0; i++)
{
}

nkast · July 15, 2020, 9:46pm

Also, i think those lines

	  currentResult = Lights[i].b - sqrt(sqLightDist);
	  currentResult = currentResult * Lights[i].a;

can be rewritten as

	  currentResult = 1 - sqrt(sqLightDist) * Lights[i].a;

Notice that now the only use of ‘b’ is in the form of a b square (b^2).
Which means you can precalculate the value of (b*b) and remove the multiplication in the shader.

//b is the radius^2 of the light
...
if ( Lights[i].b > sqLightDist )
....

Trinith · July 15, 2020, 10:06pm

Actually, I’m not sure when the point at which you can inject a parameter into a shader is. Does it have to be done only once before all draw calls are made, or can it be done on a per object basis? What impact on performance does either have? Are there ways to batch up the drawing process so you can cut down the number of lights passed to the shader at any given time?

I will say this… 144 lights on the screen at any given time for a 2D application seems like it’s a lot and maybe that’s something you might want to take a look at. Obviously this is your game and your vision, so you’re going to have to do what you think is best, I’m just trying to ask questions that might lead you to solutions

nkast · July 15, 2020, 10:34pm

If you have mostly static lights you can bake them against the static geometry.

tval · July 15, 2020, 10:36pm

I looked into that. But they are not static. They are lights from moving people or spell animations or torches from the maps. But I was intrigued by this method. Could I make a texture2d from an array of lights that I could pass into the shader which would complete the computations faster than caculating them in the shader?

tval · July 15, 2020, 11:15pm

@nkast wow. you are a genius. The changes above made all the difference. I don’t have any drop in FPS now. Thanks a bunch. I just need to figure out now how to feather the edge of the circle better now.

Trinith · July 16, 2020, 1:52am

I’m glad you got it solved!

I think I’m going to have to do some experiments tomorrow though, I actually wanna see how selectively passing light information to the shader works out, if it even does lol. Could be a fun thing to do!

nkast · July 16, 2020, 6:57am

I don’t think it w’ll worth it if the lights are not static then, at the and you are computing the same think but now in two steps.

Thanks! Micro-optimization feels rewarding as it is quick and easy.
I like the ideas thrown out by @Trinith as well, about partitioning the space with a qtree or something to reduce the number of lights you send to the shader. It will take more time to implement but at the end it might worth it. As long as you run out of CPU headroom and the lights are still a good percentage of it course…

tval · July 16, 2020, 1:56pm

Thanks, I have a bit of logic before putting the lights into the shader that remove duplicate lights or lights that are within “x” distance of another depending on the size of the light. I assume that the quad tree to reduce the number of lights that each pixel check against would be within the shader (since the shader checks every pixel against the input lights)?

The problem I have now is that for each light added, I see about a drop of 5 fps. So it still needs to be more efficient.

Perhaps something of deferred lighting is necessary. So draw one render target for the main screen and another for the lights. This light render target could be scaled down to 1/4 or 1/8 of the size of the main render target. Then use some alpha blending to draw the light render over the main render. I suppose I would have to test, but I wonder how much better this would be for drawing lights for people who have old phones.

Trinith · July 16, 2020, 9:46pm

I’ve been playing around. I think I understand this a bit better now. You’re running this shader over a render target output of your entire scene, aren’t you? I think that might be where some of the confusion came from… if that’s the case, I can totally see how you might have a ton of lights to check for every pixel. I was thinking about it from another perspective, where the lighting calculations are done for each draw call that’s made. So, for example, every time you render a tile, you run the shader.

For your method, you only run the shader once, when you dump your render target to the sceen. However, every single pixel of that render target output has to check against every light. With my approach you have to run the shader with every draw call (for sprites that should be lit), but you only have to pass the lights that are actually near that object.

I’m honestly not sure which is the better approach. I’m currently playing about and have a scene set up to render a tile map I made in Tiled. It took a bit because I just couldn’t get MonoGame.Extended.Tiled (and content pipeline stuff) working, so I wrote my own file parser and renderer.

I’ll continue to play with it

tval · July 16, 2020, 10:38pm

I use MGE.Tiled in the project. It can be a bit tricky for someone just starting out. Essentially, you add the tmx files to the pipeline, then add the tsx file to the project as ‘Content’->‘Copy if Newer’ (but not to the pipeline). MGE draws the map as primitives, so I was unsure how to apply a shader with the lights (using primitives) since this information is stored as a tile PropertyObject from the tileset file itself. But yeah, otherwise I think drawing entities with the effect and just passing in the 3-4 lights would be easy. It was just the map part that made it tough and steered me the direction of drawing the lights to the entire scene after the fact.

Trinith · July 16, 2020, 10:44pm

How did you get MGE to load the tmx file, if not through the content importer? That’s where I got bogged down. My plan was to use the MGE TiledMap object, but write my own renderer that could use the effect.

I’m currently working on the effect part, but apparently you can’t access SV_POSITION with ps_4_0_level_9_1 shaders… and so googling has commenced

*Edit: Apparently you just can’t use it and so you have to pass in some information about where in the world the object you’re rendering is. What a pain… lol.

tval · July 16, 2020, 10:49pm

The MG pipeline imports the tmx file, but not the tsx file. So in other words, the map file, but not the tileset file. I add a Custom Property to the tile in the tileset such as light -> 500 (which would be the radius of the light)

Trinith · July 16, 2020, 10:55pm

I’m still not quite sure what you mean, but I’m thinking maybe that’s for another time. I’ve got something working now that renders a map made in tiled. It might not be the most robust thing, but it works My goal is to get a basic light shader working, so I figure maybe one thing at a time

I feel like I’m close but I think I’m having trouble calculating the position and distance from my light source.

Trinith · July 16, 2020, 11:46pm

Ugh, whenever I work in shaders, I realize why I hate working in shaders! Maybe you can help me out a bit so I can help you out

I’m not getting the expected results from my shader. I wrote a very simple one to set the r value of my output to the input’s U texture coordinate value. Here is the shader…

#if OPENGL
#define SV_POSITION POSITION
#define VS_SHADERMODEL vs_3_0
#define PS_SHADERMODEL ps_3_0
#else
#define VS_SHADERMODEL vs_4_0_level_9_1
#define PS_SHADERMODEL ps_4_0_level_9_1
#endif

Texture2D SpriteTexture;
sampler2D InputSampler = sampler_state
{
	Texture = <SpriteTexture>;
};

struct VertexShaderOutput
{
	float4 Position : SV_POSITION;
	float4 Color : COLOR0;
	float2 UV : TEXCOORD0;
};

float4 MainPS(VertexShaderOutput input) : COLOR
{
	float4 output = float4(0, 0, 0, 0);
	output.rgb = input.UV.x;
	output.a = 1;

	return output;
}

technique BasicColorDrawing
{
	pass P0
	{
		PixelShader = compile PS_SHADERMODEL MainPS();
	}
};

I’m just rendering a set of tiles, and so I would expect each rendering to be a gradient square from black to white. Instead I get this…

I suspect this is why I’m having trouble calculating my lighting, because my UV values aren’t as expected. My texture is a texture sheet, with many sprites on it at once. Could that have anything to do with it?

Actually, looking at the gradients for the tiles, I bet that’s what’s happening here. The UV values are relative to the overall texture, not the area of the texture I’m rendering. I’m going to have to calculate the local UV values apparently… fun times!

tval · July 16, 2020, 11:50pm

So I have decide to change the shader and expand upon some of the ideas mentioned above. Here is what I am doing:

Since I am just interested in a simple point light in this game, I created a round sprite of white pixels that slowly feathered out to fully transparent near the edges. I use this sprite to draw to a blank render target with the destination rectangle the size of the radius for as many lights as I need. The result of this is a RenderTarget2D that is similar to the static images nkast posted above. Then I use the shader to draw the Ambient color over my entire scene and to combine the lights render target.

So far I can render 500 lights without any drop in FPS. So for me, it appears to be the easy solution.

Trinith · July 17, 2020, 2:17am

Haha the time honoured programmer approach of “scrap it, we’ll do it a different way!” Honestly, that’s probably the better way to go…

In the interests of learning, I’m going to keep plugging away with my example because I still want to see if the actual number of lights can be cut down in the shader… and I think the answer is yes; however, the shader itself becomes a lot more complex because I have to calculate a local UV value from an overall Texture Atlas UV so that I can transform each pixel into a world position, then compare that against the light.

float4 MainPS(VertexShaderOutput input) : COLOR
{
	float4 output = tex2D(InputSampler, input.UV) * input.Color;
	
	float2 localUV = float2(
		(input.UV.x - xURange.x) / (xURange.y - xURange.x),
		(input.UV.y - xVRange.x) / (xVRange.y - xVRange.x)
	);

	float2 pixelPosition = xWorldPosition + xWorldSize * localUV;

	float4 finalLight = xGlobalLight;
	float distToLight = distance(lightPos, pixelPosition);
	if (distToLight < lightRad)
	{
		float p = 1 - distToLight / lightRad;
		float4 lightColor = float4(1, 1, 0, 1);
		finalLight = ((p * lightColor) + ((1 - p) * xGlobalLight));
		finalLight.a = 1;
	}

	output *= finalLight;

	return output;
}

This is currently using a single, hard coded light. I will expand this later.

There is one big issue though… I had to change the SpriteBatch begin call to Immediate mode so that it can properly process each of these textures as they are rendered. Unfortunately this is a massive performance hit, going from around 5800 fps to 1500 fps. The tradeoff is that I can now filter out any light that’s not an actual part of my scene and should be able to support any number of real-time lighting effects. Unless someone knows a way to get this working in deferred mode.

For the purposes of this thread, I think I wanna see it through and maybe post the code somewhere as an example. I dunno how useful that will be, but there it is!