Instanced Geometry performance

I have finished implementation of instanced geometry rendering but I’ve run into some “issues”. I am testing instances of models with 6090 vertices (it should be half of that but that’s different FBX thing I would like to ask about in the end).

Now the “issue” is that for 40 models there was no performance change. So I have replaced all textures with extremely small textures, simplified shader to barebones and removed everything regarding rendering of that model that I believed might cause bottle neck. Still no change, so I pushed it to 100 models, still exactly same framerate in both instanced and noninstanced approach, then to 400 models without any change still. Scene model drawcalls at that point 2 vs 401.

So my question is, is this to be expected? Am I simply still at amount of drawcalls that wouldn’t cause literally any difference for 6k triangles model? Or is this more likely on my side? Running tests on GTX 770.

And my quick second question (here I am sure I am doing something wrong): I can successfully import and render models from FBX only if I export model with split vertex normals and then I let monogame pipeline generate tangent frames. If do not export split normals then model is imported as second bone, so the mode in Monogame has two bones, root and mesh as a second bone. I haven’t found any way how to export/import model without spliting per-vertex Normals and using original Tangents and binormals. I believe I tried all FBX version from 2006 to 2014.

Thank you all for your time.

Hey @Moirai, are you using a DX or GL project?
EDIT: also, what version are you on?

Oh, yeah, I should have mentioned. I am using Dx and Pipeline

Hmm, I’d definitely expect a speedup with instancing for this. Any chance you could share the project or a relevant part of it?

Give me few moments (30-50 minutes), I will throw here all part of relavent to instance geometry, thank you a lot for quick answers.

That’s awesome :slight_smile: No worries, glad to help! If you could put it on Git (if you’re fine with having it public) that would be most convenient for me.

I realized I have modified Monogame source a bit (I needed to open more than four render targets at time) and it means for me few complications when sharing project as a whole. If you don’t mind I will try to throw here relevant parts of code, if there wont be any obvious mistake I made I will look into way how to share project so you can just run it if that’s fine.

1 Like

Yeah, sure :slight_smile:

//Go through list of trees and creates vertex buffer, this of course happens only on inicialization of scene
//Vertex declaration has byteStride 64, 4x Vector4 (world matrix), world matrix is transposed (column major) to save few instructions when building matrix on GPU
private void GenerateWorldMatrixBuffer(List<InstanciedModelInfo> modelList, ref VertexBuffer vertexBuffer)
    Matrix[] worldMatrixArray = new Matrix[modelList.Count];
    for(int i = 0; i < modelList.Count; i++)
        worldMatrixArray[i] = modelList[i].WorldMatrix;
    vertexBuffer = new VertexBuffer(RENDERING_MANAGER.device, VegetationVertex.VertexDeclaration, worldMatrixArray.Length, BufferUsage.WriteOnly);

//During scene inicializtion
//Generate World Matrix vertex buffer for all trees
GenerateWorldMatrixBuffer(tree01ModelsInfo, ref tree01WorldBuffer);

//Build bindings
tree01Bindings = new VertexBufferBinding[2];
//Whole model is in meshPart[0]
tree01Bindings[0] = new VertexBufferBinding(MODEL_MANAGER.tree01.Meshes[0].MeshParts[0].VertexBuffer);
//Vertex buffer of world matrices, frequency 1
tree01Bindings[1] = new VertexBufferBinding(tree01WorldBuffer, 0, 1);

Draw cycle

DrawInstanciedModels(tree01Bindings, tree01.model.Meshes[0].MeshParts[0], tree01ModelsInfo.Count, tree01.effect);

private void DrawInstanciedModels(VertexBufferBinding[] bindings, ModelMeshPart mesh, int count, ModelShader effect)
    //set bindings
    //get indices
    RENDERING_MANAGER.device.Indices = mesh.IndexBuffer;
    RENDERING_MANAGER.device.DrawInstancedPrimitives(PrimitiveType.TriangleList, 0, 0, mesh.IndexBuffer.IndexCount / 3, count);

Vertex Shader

struct VertexShaderInput
    float4 Position : SV_Position;
    float4 Normal : NORMAL0;
    float3 Tangent : TANGENT0;
    float3 Binormal : BINORMAL0;
    float2 TexCoord : TEXCOORD0;
    float4 Color : COLOR0;

struct VertexShaderInstanceInput
    float4 row1 : TEXCOORD1;
    float4 row2 : TEXCOORD2;
    float4 row3 : TEXCOORD3;
   float4 row4 : TEXCOORD4;

VertexShaderOutput DebugVertexInstancied(VertexShaderInput input, VertexShaderInstanceInput input2)
	VertexShaderOutput output;
	//actually columns
	float4x4 WorldInstance = CreateMatrixFromCols(input2.row1, input2.row2, input2.row3, input2.row4);
	float objectID = dot(float3(WorldInstance._41, WorldInstance._42, WorldInstance._43), 1);	
	float4 worldPosition = mul(input.Position, WorldInstance);
	float4 viewPosition = mul(worldPosition, View);

        output.Position = mul(viewPosition, Projection);
	output.WorldPos = worldPosition;
	output.TexCoord = input.TexCoord;
	output.Normal = normalize(mul(float4(,0), WorldInstance));
	output.Tangent = mul(input.Tangent, (float3x3)WorldInstance);
	output.Tangent = normalize(output.Tangent);
        output.Binormal =  mul(input.Binormal, (float3x3)WorldInstance);
        output.Binormal = normalize(output.Binormal);
	output.Color = input.Color;	
	return output;

Thanks again and if this doesn’t help or is too inconvenient I will do my best to get project in way you can run it (but lot of things are in prototype state so I think this actually might be more readable from outside view.

By the way, alternatively, if it would be better for you I can pack my project and send it to you directly. (I will just admit that altho I use git I would prefer to avoid fighting creating and sharing new repo :D)

Edit: Change of plan, if you send me your Email or BitBucket user name I will share my repo with you. I made small and instanced / non instanced geometry test is now controlled by one bool.

My BitBucket name is ‘JesseGielen’ :slight_smile:

Done, Scene.cs
bool renderInstancied = true;

1 Like

Thanks, I’ll take a look tomorrow (CET).

1 Like

Hey @Moirai. I took a look at your project. You’re right that the overhead of all the draw calls doesn’t cause enough overhead compared to what the GPU has to handle. If I bump up the number of instances some more the instanced version does perform faster (60 x 60 trees gave me 16 vs 24fps; GTX 960M). I verified with RenderDoc that the draw calls occur as expected so it’s not a MonoGame bug.

Not sure what the bottleneck is with less trees :confused:

1 Like

Thank you a lot for info and your time. What confuses me is that Instanced geometry simply starts to pay of at FPS that are not usable anyway (in your case 16 vs 24) while causing additional overhead (for instance for culling, not to mention if you want to pull one tree from rest of the instance to do for example destruction, burn it down, etc) and this is in case when instanced models are running with very small simple texture and pixel shader. It’s puzzling me. Anyway, thanks again! :slight_smile:

Oh, one more thing if you are willing to spend some more time answering my questions: About that import thing, is it possible to import FBX without split vertex normals? Since when I don’t use split vertex normals then mesh is imported as second bone. (Or should I open new thread on this topic? - since I get this out quite a OT considering name of this thread)

Yeah, I was surprised as well. Thought for sure something was up. No worries :wink:

I didn’t understand exactly what you meant by that. So the model when loaded has two bones, the root bone that’s empty and then another bone that holds the entire mesh, is that correct?

Instead of a forum topic, please open an issue on our GitHub page because this sounds like a bug and that’s where we track those:
If you do, please provide a clear explanation of what the loaded model looks like. This might be a bug in how we import models.

1 Like