[SOLVED] Limitations on number of 2D point lights using HLSL

I’m working on a 2D point light system for my game, and I’m passing an HLSL effect to SpriteBatch when I draw. I’ve got the effect working right—I think—but I’m going to run into a problem with it. The problem is, HLSL does not support dynamically sized arrays, and I want to be able to support an arbitrary number of point lights.

Here is the HLSL file:

#if OPENGL
	#define SV_POSITION POSITION
	#define VS_SHADERMODEL vs_3_0
	#define PS_SHADERMODEL ps_3_0
#else
	#define VS_SHADERMODEL vs_4_0_level_9_1
	#define PS_SHADERMODEL ps_4_0_level_9_1
#endif

texture Texture;

//Arbitrary limit of 4 lights...
float4 lights[4];

sampler TextureSampler = sampler_state{ Texture = <Texture>; };

struct VertexShaderOutput
{
	float4 Color : COLOR0;
	float2 TextureCoordinate : TEXCOORD0;
};

float4 PixelShaderFunction(VertexShaderOutput input) : COLOR0
{
	float4 color = tex2D(TextureSampler, input.TextureCoordinate);

	//Convert the TextureCoordinate into pixel coordinates
	float2 pixelLocation = float2(1280, 720) * input.TextureCoordinate;

	//The final light level after calculations are complete
	float result = 0;

	float currentResult;

	for (int i = 0; i < lights.Length; i++)
	{
		//r and g are the x and y coordinates of the point light
		//b is the radius of the light
		//a is the opacity or brightness of the light
		currentResult = lights[i].b - distance(pixelLocation, float2(lights[i].r, lights[i].g));
		currentResult /= lights[i].b;
		currentResult *= lights[i].a;
		if (currentResult > 0) result += currentResult;
	}

	//Don't allow light level to go below .2
	if (result < .2) result = .2;

	return result * color;
}

technique Technique1
{
	pass Pass1
	{
		PixelShader = compile PS_SHADERMODEL PixelShaderFunction();
	}
}

I’m pretty new to HLSL, so I apologize if any of that code is laughably bad.

Is there any way I could use this type of point light implementation and allow for an arbitrary or dynamic amount of lights? And is there anything incorrect about the HLSL code?

Define a maximum amount of lights, like for example 50.
Make another variable that tells you how many lights are currently used.

For example if you only use 5 lights pass a variable like CurrentAmountOfLights = 5
pass the light array.

Then in your shader loop through all lights until CurrentAmountOfLights

1 Like

Thanks. Would a 50-iteration for loop have a significant performance impact during a pixel shader? I tried, just out of curiosity, setting the float4 lights array to have 250 elements, and the shader wouldn’t compile, saying

unable to unroll loop, does not appear to terminate in a timely manner (147 iterations), use the [unroll(n)] attribute to force an exact higher number

I’m not sure where the 147 iterations comes from, but does that mean any number higher than that is a bad idea?

I suppose the only other option is to draw light gradient textures to a RenderTarget2D and then use a shader to overlay those on the main RenderTarget2D, but I’d rather avoid doing that if I can.

Also, is constructing and passing a large-ish array of Vector4s to the shader once per frame an expensive operation?

If I try to create an array of 100 float4s, I get this compile error:

Invalid const register num: 100. Max allowed is 31.

If I bring it down to 30, I get these errors:

Compiled shader code uses too many arithmetic instruction slots (333). Max. allowed by the target (ps_2_0) is 64.
Compiled shader code uses too many instruction slots (334). Max. allowed by the target (ps_2_0) is 96.

Which seems odd. Since this is a DirectX project, shouldn’t the target pixel shader model be ps_4_0?

The highest number of array elements I can use before it refuses to compile is 5. That seems a little low.

EDIT: I suspected the issue might have to do with my graphics profile, so I changed it from Reach to HiDef, and it let me compile a shader with an array size of 150. Are there any significant repercussions from targeting HiDef as opposed to Reach?

Sorry for the triple post, but I’m still having problems, even after switching the target profile to HiDef.

It seems like the shader compiler simply won’t let me use a non-constant number for the condition of the for loop. I’m getting this error again:

error X5589: Invalid const register num: 151. Max allowed is 31.

And the location of the error, (28,9), doesn’t seem to make any sense.

Here’s my newly edited shader code:

#if OPENGL
#define SV_POSITION POSITION
#define VS_SHADERMODEL vs_3_0
#define PS_SHADERMODEL ps_3_0
#else
#define VS_SHADERMODEL vs_4_0_level_9_1
#define PS_SHADERMODEL ps_4_0_level_9_1
#endif

texture Texture;

float4 lights[75];

float4 lightColors[75];

int NumberOfLights = 0;

sampler TextureSampler = sampler_state { Texture = <Texture>; };

struct VertexShaderOutput
{
	float4 Color : COLOR0;
	float2 TextureCoordinate : TEXCOORD0;
};

float4 PixelShaderFunction(VertexShaderOutput input) : COLOR0
{
	float4 color = tex2D(TextureSampler, input.TextureCoordinate);

	//Convert the TextureCoordinate into pixel coordinates
	float2 pixelLocation = float2(1280, 720) * input.TextureCoordinate;

	//The final light level after calculations are complete
	float4 result = 0;

	float currentResult;

	for (int i = 0; i < NumberOfLights; i++)
	{
		//r and g are the x and y coordinates of the point light
		//b is the radius of the light
		//a is the opacity or brightness of the light
		currentResult = lights[i].b - distance(pixelLocation, float2(lights[i].r, lights[i].g));
		currentResult /= lights[i].b;
		currentResult *= lights[i].a;

		if (currentResult > 0) result += lightColors[i] * currentResult;
	}

	//Don't allow light level to go below .2
	if (result.r < .2) result.r = .2;
	if (result.g < .2) result.g = .2;
	if (result.b < .2) result.b = .2;

	return result * color;
}

technique Technique1
{
	pass Pass1
	{
		PixelShader = compile PS_SHADERMODEL PixelShaderFunction();
	}
}

When I change the for loop to use a constant, like this:

for (int i = 0; i < 5; i++)

…it compiles fine, but only if the number of iterations is five or below. Any higher, and I get an error like this:

error X5608: Compiled shader code uses too many arithmetic instruction slots (73). Max. allowed by the target (ps_2_0) is 64.

There again, it says the target is ps_2_0, even though I changed the profile to HiDef.

Any ideas about how I can use more than five point lights?

EDIT: Another update. After finding out that ps_4_0_level_9_1 is like a compatibility mode of shader level 4.0, I changed it to ps_4_0_level_9_3, and that allowed me to up the constant from 5 to 15, but no higher, and still didn’t fix the problem of not allowing a variable amount of for loop iterations.

If I drop the “level_9_3” part entirely, the shader will compile with an even higher constant, but then a runtime exception is immediately thrown by SharpDX, saying, most helpfully, “The parameter is incorrect.” I’m assuming this is due to my laptop using integrated Intel HD Graphics 4600, but the DirectX dialog says it supports DirectX 11…

Is there any shader level allowed past 4.0 level 9.3?

UPDATE: The SharpDX exception was due to me not setting the graphics profile to HiDef properly in the Game1 constructor.

You can have a variable amount of lights, but you need to specify the upper bound exactly like the error says: by adding [unroll(upper bound)] right before the loop. There’s a restriction to the number of instructions shaders can use. If you need more lights (15 should really be plenty, especially in 2D) you can upgrade the shader profile to 4.0 or higher. You’ll need to set the GraphicsProfile to HiDef in your game if you do that, otherwise the shader will fail to load.

You can actually just access those using .x and .y rather than .r and .g. On float4 you can do either .rgba or .xyzw.

It’s faster to use min and max in shaders. I.e. result += max(0, lightColors[I] * currentResult);
Same thing with the clamping at the end.

Sure, you can use:
ps_4_0
ps_5_0
:wink: Official doc

Even 6_0 but that’s a new feature not supported by all graphics card (old ones won’t)

Concerning IntelHD… I dunno if it can be fast enough to support many instructions with ps_5_0. The good point is it will force you to optimize the code to run your app smoothly :slight_smile: something not everyone does.

oh it’s very important to not just use basic

for loops

You want to write it like this

[loop]
for(…)

the preprocessor is important if you deal with a dynamic amount of lights. And it compiles a lot lot faster.

Thanks for all the help, guys. Last night I finally got it working using ps_4_0, which should be plenty high enough for me. It turns out, for some reason, the VertexShaderOutput needs to be very specific in order to work right. Here’s the complete shader code, with the changes @Jjagg suggested:

#if OPENGL
#define SV_POSITION POSITION
#define VS_SHADERMODEL vs_3_0
#define PS_SHADERMODEL ps_3_0
#else
#define VS_SHADERMODEL vs_4_0
#define PS_SHADERMODEL ps_4_0
#endif

Texture2D Texture : register(t0);

float4 lights[75];

float4 lightColors[75];

int numberOfLights = 0;

float minimumLightLevel;

int horizontalResolution;
int verticalResolution;

SamplerState TextureSampler
{
	Filter = Point;
	AddressU = Clamp;
	AddressV = Clamp;
};

//This was taken straight from some MSDN example
struct VS_OUTPUT
{
	float4 Position   : SV_POSITION; // vertex position 
	float4 Diffuse    : COLOR0;      // vertex diffuse color (note that COLOR0 is clamped from 0..1)
	float2 TextureUV  : TEXCOORD0;   // vertex texture coords
};

float4 PixelShaderFunction(VS_OUTPUT input) : COLOR0
{
	float4 color = Texture.Sample(TextureSampler, input.TextureUV);

	float2 pixelLocation = input.Position;

	//The final light level after calculations are complete
	float4 result = 0;

	float currentResult;

	for (int i = 0; i < numberOfLights; i++)
	{
		//x and y are the coordinates of the point light
		//z is the radius of the light
		//a is the opacity or brightness of the light
		currentResult = lights[i].z - distance(pixelLocation, lights[i].xy);
		currentResult /= lights[i].z;
		currentResult *= lights[i].a;

		//if (currentResult > 0) result += lightColors[i] * currentResult;
		result += max(0, lightColors[i] * currentResult);
	}

	//Don't allow light level to go below minimumLightLevel
	result.r += max(0, minimumLightLevel - result.r);
	result.g += max(0, minimumLightLevel - result.g);
	result.b += max(0, minimumLightLevel - result.b);

	return result * color;
}

technique Technique1
{
	pass Pass1
	{
		PixelShader = compile PS_SHADERMODEL PixelShaderFunction();
	}
} 

For some reason, if I delete float4 Diffuse : COLOR0; from the VS_OUTPUT struct, the pixel shader won’t work right, even though I don’t use that variable at all in the pixel shader. I obviously still have a lot to learn about HLSL.

The way I have the shader code now allows a variable amount of for loop iterations without adding that attribute, which I’m assuming is because I’m using ps_4_0 instead of ps_4_0_level_9_[n][quote=“Jjagg, post:6, topic:9269”]
If you need more lights (15 should really be plenty, especially in 2D)
[/quote]
The reason I want to have more than fifteen lights is because I’m planning on having quite a few objects as light sources, just to see what it looks like.

What does adding the [loop] attribute do? Do I need it? It seems to work fine without it.

if you don’t include it you can get stuff like this

unable to unroll loop, does not appear to terminate in a timely manner (147 iterations), use the [unroll(n)] attribute to force an exact higher number

By default for loops get unrolled, which is equivalent to the preprocessor [unroll].

This means that the code for
for(int i = 0; i<10; i++
{
somecode(i);
}

will be unrolled for the compiled code and look like htis

somecode(0)
somecode(1)
somecode(2)
somecode(3)

etc.
this is good for speed since you do not have to check if i < 10 every time you go up one iteration. It’s also good since the compiler checks every case for bugs.

At the same time, if you loop a different amount of times each time it makes no sense to unroll the for loop.

If you put [loop] before the for loop it will stay like that in the compiled code.

That means that you can early out the loop.
For example if you only have 2 lights you don’t need to check all 50 and cancel the contribution of 48, instead just abort after 2.

OK, I can keep the [loop] attribute there. It does work without it, though, and I suspect that’s because the compiler automatically refrains from unrolling the loop when I’m using a variable within the condition for the for loop. It wouldn’t really make sense for it to be unrolled when the condition is checked against a variable value, so it seems like I don’t actually need to put the [loop] attribute there.

The outputs of your vertex shader and the inputs of your pixel shader always need to match. In this case, since this effect only defines a pixel shader and you use it when drawing with a SpriteBatch, the inputs need to match the outputs of the vertex shader that SpriteBatch uses which is positions, color and texture coordinates (in that order) as can be seen here. That should probably be documented better… Check out the effect writing tips in the docs page here, there’s some other common issues listed there.

That’s good to know. I assume that the COLOR0 input is the color that’s passed to the Draw method of SpriteBatch.

That page is helpful. I’ve definitely ran across the aggressive optimization of the compiler before, which can be somewhat annoying when I just temporarily want to not use a parameter to test something, and I have to comment out any C# code that sets it…

Yes, exactly.

Yes, this is a very common annoyance. A little trick I use is to keep using the parameter in the shader, but scale it so it doesn’t effectively contribute i.e. do something like

float4 colorOffset;

float4 PixelShaderFunction(float4 pos, float4 inputColor : Color0) : COLOR0 {
    return inputColor + colorOffset * 0.000001;
}

to effectively disable the effect of colorOffset, but keep the parameter.

Good idea. Too bad there’s no preprocessor directive you can put in, like #NOOPT or something, to disable the aggressive optimization when testing.

Considering how many people bump into it, maybe MG should allow turning down the optimization level.