Instancing for particle system

Hi,

i try to optimize my particle system. Just for fun. I use a single vertex buffer and index buffer for all emitters and try to optimize it with instancing.

I have one vertex buffer for my quad (texture cordinates only) and the index buffer of course.

I also have a vertex buffer for my instances with position, rotation, color and size.

But it just does not render anything. My custom shader works, I tested it with the old rendering, but I am doing something wrong :frowning:

    public sealed class ParticleProcessor : BaseDrawingProcessor<ParticleComponent>
    {
        private const int NumParticlesPerRenderingCall = 4000;
        private IndexBuffer indexBuffer;
        private VertexBuffer vertexBuffer;
        private VertexBuffer instanceBuffer;
        private VertexBufferBinding vertexBufferBinding;
        private VertexBufferBinding instanceBufferBinding;
        private VertexBufferBinding[] bindings;
        private VertexPositionColorRotationSize[] instanceVertices;
        private ParticleEffect particleEffect;

        public override void LoadContent(Scene scene)
        {
            indexBuffer = new IndexBuffer(scene.GraphicsDevice, IndexElementSize.SixteenBits, 6, BufferUsage.None);
            indexBuffer.SetData(new long[] { 0, 1, 2, 0, 2, 3 });

            vertexBuffer = new VertexBuffer(scene.GraphicsDevice, typeof(VertexTexture), 4, BufferUsage.None);
            vertexBufferBinding = new VertexBufferBinding(vertexBuffer);
            vertexBuffer.SetData(new VertexTexture[]
            {
                new VertexTexture(new Vector2(0, 0)),
                new VertexTexture(new Vector2(1, 0)),
                new VertexTexture(new Vector2(1, 1)),
                new VertexTexture(new Vector2(0, 1)),
            });

            instanceVertices = new VertexPositionColorRotationSize[NumParticlesPerRenderingCall];
            instanceBuffer = new VertexBuffer(scene.GraphicsDevice, typeof(VertexPositionColorRotationSize), NumParticlesPerRenderingCall, BufferUsage.WriteOnly);
            instanceBufferBinding = new VertexBufferBinding(instanceBuffer, 0, 1);

            bindings = new VertexBufferBinding[] { vertexBufferBinding, instanceBufferBinding };

            particleEffect = new ParticleEffect(scene.Content.Load<Effect>("Effects/Particles"));
        }

        private void RenderJob(GraphicsDevice graphicsDevice, ParticleRenderingJob job, Matrix viewProjection)
        {
            graphicsDevice.Indices = indexBuffer;

            particleEffect.CurrentTechnique.Passes[0].Apply();

            var particleIndex = 0;

            unsafe
            {
                for (var i = 0; i < job.Particles.Length; i++)
                {
                    var particle = job.Particles[i];

                    if (particle->Lifetime > 0)
                    {
                        instanceVertices[particleIndex].Position = particle->Position;
                        instanceVertices[particleIndex].Rotation = particle->Rotation;
                        instanceVertices[particleIndex].Size.X = particle->Size;
                        instanceVertices[particleIndex].Size.Y = particle->Size;
                        instanceVertices[particleIndex].Color = particle->Color;

                        particleIndex++;

                        if (particleIndex == NumParticlesPerRenderingCall)
                        {
                            RenderMesh(graphicsDevice, particleIndex);

                            particleIndex = 0;
                        }
                    }
                }
            }

            RenderMesh(graphicsDevice, particleIndex);
        }

        private void RenderMesh(GraphicsDevice graphicsDevice, int numParticles)
        {
            if (numParticles > 0)
            {
                instanceBuffer.SetData(instanceVertices, 0, numParticles);

                graphicsDevice.SetVertexBuffers(bindings);
                graphicsDevice.DrawInstancedPrimitives(PrimitiveType.TriangleList, 0, 0, 2, numParticles);
            }
        }
    }

What is confusing for me, is that the ordering of the vertex bindings matters. When I change the order I get the following execption: : 'An error occurred while preparing to draw. This is probably because the current vertex declaration does not include all the elements required by the current vertex shader. The current vertex declaration includes these elements: SV_Position0, COLOR0, COLOR1, TEXCOORD1, TEXCOORD0.'

Which is strange because it matchs to my shader:

struct VSInput
{
    float4 Position : SV_POSITION;
    float2 TexCoord : TEXCOORD0;
    float2 Size     : TEXCOORD1;
    float4 Color    : COLOR0;
    float  Rotation : COLOR1;
};

I thought it could be more shader feature level, but I use 4_0 with directx and it should work:

#if OPENGL
#define SV_POSITION POSITION
#define VS_SHADERMODEL vs_3_0
#define PS_SHADERMODEL ps_3_0
#else
#define VS_SHADERMODEL vs_4_0
#define PS_SHADERMODEL ps_4_0
#endif

float4x4 WorldViewProjection;

float3 CameraRight;
float3 CameraUp;

texture Texture;

sampler TextureSampler = sampler_state
{
    Texture = (Texture);
    
    MINFILTER = LINEAR;
    MAGFILTER = LINEAR;
    MIPFILTER = LINEAR;
     
    ADDRESSU = CLAMP;
    ADDRESSV = CLAMP;
};

struct VSInput
{
    float4 Position : SV_POSITION;
	float2 TexCoord : TEXCOORD0;
	float2 Size     : TEXCOORD1;
    float4 Color    : COLOR0;
	float  Rotation : COLOR1;
};

struct VSOutput
{
    float4 Position : SV_POSITION;
    float4 Color    : COLOR0;
    float2 TexCoord : TEXCOORD0;
};

float4 ComputeParticleRotation(float rotation)
{  
    float c = cos(rotation);
    float s = sin(rotation);
    
    float4 rotationMatrix = float4(c, -s, s, c);
    
    rotationMatrix *= 0.5;
    rotationMatrix += 0.5;
    
    return rotationMatrix;
}

VSOutput VSDefault(VSInput input)
{
	VSOutput output;

    float3 position = input.Position.xyz;
	position += ((input.TexCoord.x - 0.5f) * input.Size.x) * CameraRight;
	position -= ((input.TexCoord.y - 0.5f) * input.Size.y) * CameraUp;

	output.Position = mul(float4(position, 1), WorldViewProjection);

    float4 rotation = ComputeParticleRotation(input.Rotation) * 2 - 1;
    
    float2 texCoord = input.TexCoord;
    texCoord -= 0.5;
    texCoord = mul(texCoord, float2x2(rotation));
    texCoord += 0.5;
    output.TexCoord = texCoord;
    
	output.Color = input.Color;

    return output;
}

float4 PSDefault(VSOutput input) : COLOR0
{
    return tex2D(TextureSampler, input.TexCoord) * input.Color;
}

technique Default
{
	pass P0
	{
		VertexShader = compile VS_SHADERMODEL VSDefault();
		PixelShader  = compile PS_SHADERMODEL PSDefault();
	}
}

Anybody? It would be great to get some help. I am lost. I also checked all calls to the driver with the Visual Debugger in VS and it looks fine. Must be a misunderstanding from my side.

This is because of an annoying bug in how MonoGame handles semantics; see issue #6282. Unfortunately it’s not a simple issue to solve with the current shader pipeline :confused:
Basically semantics are ignored on the Windows DX platform and the parameters are bound by position instead. So to fix your issue, make sure you set up the vertex input in your shader so you can map your buffers in order (i.e. put the texture coordinate either first or last), then bind your buffers in the right order in your CPU code.

struct VSInput
{
    float4 Position : SV_POSITION;
    float2 Size     : TEXCOORD1;
    float4 Color    : COLOR0;
    float  Rotation : COLOR1;
    float2 TexCoord : TEXCOORD0;
};
bindings = new VertexBufferBinding[] { instanceBufferBinding, vertexBufferBinding };
1 Like

Does not help. I changed it to your order but I do not see anything yet.

Can you share your vertex type classes?

Sure:

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct VertexTexture : IVertexType, IEquatable<VertexTexture>
    {
        public static readonly VertexDeclaration VertexDeclaration;

        public Vector2 TextureCoordinate;

        static VertexTexture()
        {
            var elements = new VertexElement[]
            {
                new VertexElement(0, VertexElementFormat.Vector2, VertexElementUsage.TextureCoordinate, 0)
            };

            VertexDeclaration = new VertexDeclaration(elements);
        }
}

and

[StructLayout(LayoutKind.Sequential, Pack = 1)]
public struct VertexPositionColorRotationSize : IVertexType, IEquatable<VertexPositionColorRotationSize>
{
    public static readonly VertexDeclaration VertexDeclaration;

    public Vector3 Position;

    public Color Color;

    public float Rotation;

    public Vector2 Size;

    static VertexPositionColorRotationSize()
    {
        var elements = new VertexElement[]
        {
            new VertexElement(0,  VertexElementFormat.Vector3, VertexElementUsage.Position, 0),
            new VertexElement(12, VertexElementFormat.Color,   VertexElementUsage.Color, 0),
            new VertexElement(16, VertexElementFormat.Single,  VertexElementUsage.Color, 1),
            new VertexElement(20, VertexElementFormat.Vector2, VertexElementUsage.TextureCoordinate, 1)
        };

        VertexDeclaration = new VertexDeclaration(elements);
    }
}

I omitted some stuff like GetHashCode and so on to keep it short.

This is a DX project, right? And are you on 3.6 or develop?

Yes, DX with Version 3.6

For me instancing is working in DirectX.

I don’t know if it makes a difference, but in the vertex shader I use separate input structs for the mesh and the instance buffer.

struct VSInput
{
    float4 pos	: POSITION;
    float3 norm	: NORMAL;
    float2 uv   : TEXCOORD0;
};

struct VSInst
{
    float2 uv : TEXCOORD1;
};

VSOutput VertexShader(VSInput input, VSInst inst) 
{

Just like you I’m also using TEXCOORD0 and TEXCOORD1 in the shader, but in C# I use UsageIndex 0 for both. Maybe you can try that as well.

4 verts falls into geometry shader land, not instancing. The GS is a huge win when you can shift workload, in the case of particles that workload is the orientation calculation - which can be fairly complicated if you have velocity oriented partices, Y oriented, or XYZ oriented particles. Why do it on the CPU when the GPU can expand a point into a quad?

The GS is the land of myths and poorly understood, but a a 4 vert strip falls into perfect for it. It’s also better than fattening vertex data for primitive relative things - if you fatten your vertex data you slow EVERYTHING down and > 64 bytes you’ve lost hi-z on basically all hardware except Windows (provided you use standard layout).