Max Vertices on GPU ?

Hi !

I’m working again on my particles engine (instanced).
For instance, I only send the center of the billboard (camera aligned), as a Vector3.
When I set the number of particles to 100000, I have Windows’ taskbar flashing, and a message telling my nvidia card has recovered from a problem.

When I set the number lower than 50000 there is no problem.

My nvidia drivers: 376.67

I tested with ShaderModel 4 and 5. Same result.
I checked Resource Limits (Direct3D 11) - Win32 apps | Microsoft Learn
And it says

Draw or DrawInstanced vertex count (including instancing) D3D11_REQ_DRAW_VERTEX_COUNT_2_TO_EXP (2³²)
DrawIndexedInstanced vertex count (including instancing) D3D11_REQ_DRAWINDEXED_INDEX_COUNT_2_TO_EXP (2³²)

Same thing here with SM4

What about MonoGame ? I’m quite low on the vertices count compare to 2^32… Even when counting the 50000 centers*4 vertices per particles.

I’ve used models with 2 million polys before…

And didn’t you use the sponza dragon, too? It has 800k polys afaik

Nope, it’s only a 3D positions array used by instanced billboards. 50000 is not that much :confused: It seems weird :confused:
I have also added another vector3 to double the amount of vertices passed, it still crashes with the same amount.
I’ll doublecheck my code to see if I did not make a mistake somewhere.

I’ll try with models later (I already have implemented this, with only 100 asteroids for now), to find at which amount it crash with them compared to billboards.

Ah about instancing… the question obviously becomes how large the buffer can be, this is not about vertices any more.

The question is how long can the vertex buffer be, 16bit (short) would sound reasonable and that would be 65k entries.

Im not sure but maybe changing the index buffer to 32bit can help, by default it is 16bit

In Xna reach was 16 bits or a c# un-signed ushort when tossing it to a instanced buffer if you tried to assign a larger array it would throw a error.

I don’t think its forced in monogame to use a ushort in the reach profile im pretty sure you can pass a larger range to a instance. But Xna did it im guessing because on some cards for reach ranged shader profiles only16 bit ranges were supported.

So maybe just uping the vertice pixel shader profile might work.

1 Like


A model with 96 vertices, 400k instances with the data passed being 4 vector4 of 4 floats works…
The 4x Vector4 allows to pass a transformaion matrix for each instance. So number speaking, I’m sending at least 5 vertor4 per instance, x400000 so I’m still far under 2^32

I was starting to wonder about the indexes last night…

I send only one vertice per particle, but what about the 4 generated in the vertexshader ? How does it work on the GPU ? Are the 4 vertices used as the “base model” for instancing sort of recycled/reused for each instance or are duplicated for each instance, and calculted with the transformations of each instance ?

The particles (using a well visible texture with contrast instead of smoke) with a 200 count:

One, maybe noticeable, difference between my model and particles buffer is the fact with models I use a DynamicVertexBuffer, and a simple VertexBuffer for the particles.

@willmotil How can I specify a 32bits buffer and not a 16bits ? (I’m already using the HiDef profile) I already tried with SM5 (currently 4) and it did not change anything :confused:

I will modify my app to allow me to add particles on the fly, to find my max before it crashes, instead of making a dichotomy to find the max number.

When using ParticleSize = new Vector2(64.0f, 64.0f) I can draw 1 000 000 particles.
When ParticleSize = new Vector2(400.0f, 400.0f) to draw big smoke quads, it crashes…

The size is used like this in the vertexshader:

float3 temp_pos3D = + (offset.x * Size.x * Side + offset.y * Size.y * Up);
There is no vertice creation using the size :confused:

The indexbuffer of the vertexbuffer definition (ie: the one defining one quad) is inititialized like this:

int[] indices = new int[6];
indices[0] = 1;
indices[1] = 0;
indices[2] = 2;
indices[3] = 3;
indices[4] = 1;
indices[5] = 2;

indexBuffer = new IndexBuffer(_GfxDev, typeof(int), 6, BufferUsage.WriteOnly);

EDIT: in fact it seems to be related to the amount of time needed to send dataonto the GPU and FPS.
Until now, The center of the particles emitter was 0,0,0, where the player starts too.
I have moved it to 400,0,400, now the game starts, and justs lags when the particles’ boundingbox collides with the camera’s frustrum.
So… What can be concluded ? :confused: a size of 400,400 created so much overlaps at the start of the first frame, that the frames per secunds were almost 0, and the GPU stalled ?

1 000 000 particles now works when not immediatly around the camera: