VertexBufferBindings/RenderTargetBindings and memory allocations

kosmonautgames · May 12, 2016, 6:46pm

Hi guys,

by far my most memory allocations which are happening on the fly and filling up garbage are SetVertexBuffers calls for instanced models. If I disable this part of rendering my GC memory doesn’t fill up even half as fast.

Now is there anything I can do about that? I only call SetVertexBuffers once per frame, I only use it for smoke geometry right now.

Normal SetVertexBuffer doesn’t seem to generate any garbage in comparison, and the files i am sending for the geometry instancing are:

graphicsDevice.SetVertexBuffers(vbModelBinding,
instanceBinding
);

The vbModelBinding is a vertexBufferBinding to the model, which is just set up once.

The instanceBinding is a binding like this

instanceBinding = new VertexBufferBinding(instanceVertexBuffer, 0, 1);

I don’t change it per frame in terms of size, but I do change the instanceVertexBuffer per frame in terms of content.
Like this:

instanceVertexBuffer.SetData(instanceTransforms, 0, instanceTransforms.Length, SetDataOptions.Discard);

Now this is just a single Matrix per instance, so it’s really not that much data I am sending.

I had no problems with memory when I sent every piece of geometry individually (via SetVertexBuffer, Indices) 300 times, but performance on GPU-side is a lot better with sending 1 piece of geometry and 300 transformation matrices, however GC hates it.

What to do?

Tom · May 12, 2016, 11:47pm

It shouldn’t generate much garbage at all, but the issue is possibly because that function uses the params keyword makes the function generate a new array on every call.

Then internal to that we additionally call another method that does the same thing:

github.com

MonoGame/MonoGame/blob/develop/MonoGame.Framework/Graphics/Vertices/VertexBufferBindings.cs#L99


      
          }
          
          /// <summary>
          /// Binds the the specified vertex buffers to the input slots.
          /// </summary>
          /// <param name="vertexBufferBindings">The vertex buffer bindings.</param>
          /// <returns>
          /// <see langword="true"/> if the input layout was changed; otherwise,
          /// <see langword="false"/>.
          /// </returns>
          public bool Set(params VertexBufferBinding[] vertexBufferBindings)
          {
              Debug.Assert(vertexBufferBindings != null);
              Debug.Assert(vertexBufferBindings.Length > 0);
              Debug.Assert(vertexBufferBindings.Length <= _vertexBuffers.Length);
          
              bool isDirty = false;
              for (int i = 0; i < vertexBufferBindings.Length; i++)
              {
                  Debug.Assert(vertexBufferBindings[i].VertexBuffer != null);

So possibly we’re getting two new arrays being created on each call.

We should be able to provide an additional overload to avoid garbage in those cases.

@KonajuGames ???

KonajuGames · May 13, 2016, 12:33am

The internal call to Set(vertexBuffers) shouldn’t be generating any further garbage because vertexBuffers is already a VertexBufferBinding[] at this point due to the compiler-generated code at the call to SetVertexBuffers(vbModelBinding, instanceBinding).

There isn’t much we can do to eliminate this without providing a heap of overloads with varying numbers of VertexBufferBinding parameters. The solution that @kosmonautgames can do right now is to pre-allocate a VertexBufferBindings[2] and populate it with the relevant vertex buffers before passing it to SetVertexBuffers.

vertexBuffers[0] = vbModelBinding;
vertexBuffers[1] = instanceBinding;
graphicsDevice.SetVertexBuffers(vertexBuffers);

This is exactly what the compiler generates, but without the allocation of a new array each call.

kosmonautgames · May 13, 2016, 12:56am

Thanks a lot, that did the trick, it’s not showing up now on the hot path at all, which is nice.

I just basically worked from the Microsoft XNA instancing sample, and I guess a lot of stuff is not great there.

For example they resize the array per frame

They do stuff like

        // Gather instance transform matrices into a single array.
        Array.Resize(ref instanceTransforms, instances.Count);

…

// If we have more instances than room in our vertex buffer, grow it to the neccessary size.
if ((instanceVertexBuffer == null) ||
(instances.Length > instanceVertexBuffer.VertexCount))
{
if (instanceVertexBuffer != null)
instanceVertexBuffer.Dispose();

            instanceVertexBuffer = new DynamicVertexBuffer(GraphicsDevice, instanceVertexDeclaration,
                                                           instances.Length, BufferUsage.WriteOnly);
        }

        // Transfer the latest instance transform matrices into the instanceVertexBuffer.
        instanceVertexBuffer.SetData(instances, 0, instances.Length, SetDataOptions.Discard);

…

foreach (ModelMeshPart meshPart in mesh.MeshParts)
{
// Tell the GPU to read from both the model vertex buffer plus our instanceVertexBuffer.
GraphicsDevice.SetVertexBuffers(
new VertexBufferBinding(meshPart.VertexBuffer, meshPart.VertexOffset, 0),
new VertexBufferBinding(instanceVertexBuffer, 0, 1)
);

per frame, which really is not optimal i guess.

I changed some stuff about that, well still a lot to optimize obviously.

NOW

my biggest memory filler is SetRenderTarget. I guess I can’t do anything about that though, right?

For stuff like Bloom I need to change it at least 10 times, but that shouldn’t be a problem should it?
eg.

_graphicsDevice.SetRenderTarget(_bloomRenderTarget2DMip2);

But rendertarget changes alone are accountable for more than 20% of my bytes

So … I need to try it out later, but is it generally preferable to use stuff like this?

RenderTargetBinding bindingBloomRenderTarget2DMip0 = new RenderTargetBinding(_bloomRenderTarget2DMip0);
[…]
_graphicsDevice.SetRenderTargets(bindingBloomRenderTarget2DMip0);

EDIT:

OK OK back with some research, which i guess you guys knew already.

So turns out if i use:

RenderTarget initialized earlier and then I use SetRenderTarget(RenderTarget) I create a lot of garbage.
RenderTargetBinding initialized earlier and then I use SetRenderTargets(RenderTargetBinding) - same garbage.
RenderTargetBinding initialized earlier and then SetRenderTargets(RenderTargetBinding) no garbage.

It is a bit frustrating that this seems to be the optimal way, yet I have never ever seen it done like this in any sample or tutorial.

Soooo to sum up. The way to do it is apparently:
1.
Create a field in your DrawClass like this

RenderTargetBinding myRenderTargetBinding = new RenderTargetBinding[1];

when creating the rendertargets (aka when resizing or initializing the game) do this:

myRenderTarget2D = new RenderTarget2D( …some properties…);
myRenderTargetBinding[0] = myRenderTarget2D;

in your draw method use this:

graphicsDevice.SetRenderTargets(myRenderTargetBinding);

KonajuGames · May 13, 2016, 1:53am

Yes, that is correct. We could easily do something behind the scenes in GraphicsDevice to eliminate the garbage generation from SetRenderTarget(RenderTarget2D) and SetRenderTarget(RenderTargetCube, CubeMap) so it uses an internal RenderTargetBinding[1] in the internal call to SetRenderTargets(params RenderTargetBinding[]). That would be an easy win for the general use case.

kosmonautgames · May 13, 2016, 2:08am

Probably a good idea,
especially given that, even though I create literally thousands of useless things each frame, both SetVertexBuffers and SetRenderTarget were still the by far the biggest positions in terms of bytes allocated.

But it would probably nice to mention the correct way of doing things in some some getting started guides as well.

I would love to expand the lackluster documentary, but I simply cannot allocate (haha) the time. My game is just a hobby anyways, less time on that and it wouldn’t move at all

kosmonautgames · May 13, 2016, 3:33pm

I wrote down these observations cleanly for people who want to improve their code.