Vertex Buffer is not any faster than SpriteBatch

In a 2D tile-based platformer I tried drawing all of my tiles using a vertex and index buffer with DrawIndexedPrimitives instead of a spritebatch because I heard it was MUCH faster, but it’s not faster than just using a spritebatch (it has the same CPU usage). First the TileAtlasBuffer is initialized, then all the tiles are added with AddTile (each tile is represented as a textured quad) and SetData is called. After that only the Draw method is called again. Specifically DrawIndexedPrimitives is no faster than just drawing all the tiles one by one with spritebatch. What am I doing wrong?


    private class TileAtlasBuffer
        public Texture2D texture;
        public VertexBuffer vertexBuffer;
        public IndexBuffer indexBuffer;
        private VertexPositionTexture[] vertices;
        private short[] indices;
        private int tileCount;
        private GraphicsDevice GraphicsDevice
            get { return Core.Instance.GraphicsDevice; }
        public TileAtlasBuffer(Texture2D texture, int tiles)
            this.texture = texture;
            vertexBuffer = new VertexBuffer(GraphicsDevice, typeof(VertexPositionTexture), tiles * 4, 
            indexBuffer = new IndexBuffer(GraphicsDevice, typeof(short), tiles * 6, BufferUsage.WriteOnly);
            vertices = new VertexPositionTexture[tiles * 4];
            indices = new short[tiles * 6];
            tileCount = 0;

        public void SetData()

        public void Draw(BasicEffect basicEffect)
            if (tileCount > 0)
                GraphicsDevice.Indices = indexBuffer;

                basicEffect.Texture = texture;
                foreach (var pass in basicEffect.CurrentTechnique.Passes)
                    GraphicsDevice.DrawIndexedPrimitives(PrimitiveType.TriangleList, 0, 0, tileCount * 2);

        public void AddTile(int tx, int ty, Rectangle rect, bool flippedHorizontally, bool flippedVertically)
            float textureSizeX = 1f / texture.Width;
            float textureSizeY = 1f / texture.Height;

            float left = rect.Left * textureSizeX;
            float right = rect.Right * textureSizeX;
            float bottom = rect.Bottom * textureSizeY;
            float top = rect.Top * textureSizeY;

            if (flippedHorizontally)
                float temp = left;
                left = right;
                right = temp;
            if (flippedVertically)
                float temp = top;
                top = bottom;
                bottom = temp;

            int vertexCount = tileCount * 4;
            vertices[vertexCount] = new VertexPositionTexture(new Vector3(tx, ty + rect.Height, 0), new Vector2(left, bottom));
            vertices[vertexCount + 1] = new VertexPositionTexture(new Vector3(tx, ty, 0), new Vector2(left, top));
            vertices[vertexCount + 2] = new VertexPositionTexture(new Vector3(tx + rect.Width, ty + rect.Height, 0), new Vector2(right, bottom));
            vertices[vertexCount + 3] = new VertexPositionTexture(new Vector3(tx + rect.Width, ty, 0), new Vector2(right, top));

            int indexCount = tileCount * 6;
            indices[indexCount] = (short)vertexCount;
            indices[indexCount + 1] = (short)(vertexCount + 1);
            indices[indexCount + 2] = (short)(vertexCount + 2);

            indices[indexCount + 3] = (short)(vertexCount + 2);
            indices[indexCount + 4] = (short)(vertexCount + 1);
            indices[indexCount + 5] = (short)(vertexCount + 3);


There’s probably not much of a difference because SpriteBatch is doing pretty much the same thing as you are: fill up a vertex buffer, then draw it.

Only if your vertex buffer is static (create once in the beginning, never change it) I would expect it to do a bit better. In that case the vertex data doesn’t need to be created every frame, and it doesn’t need to be transferred to the GPU every frame.

You will still need quite a large number of sprites to see a noticeable difference in CPU usage between the static and dynamic method.

The vertex and index buffers are both static as I explained, they are created once in the beginning and never changed, only the Draw method is called again. I was quite disappointed to see the performance (it takes about 1 ms to draw 500 tiles) because the guy who answered this question:

claimed to be able to draw 500.000 tiles in 1/20th of a millisecond with this approach.

The reason why it’s so slow is that the overhead from DrawIndexedPrimitives is so large, even when I’m just drawing 1 tile it still takes almost as long. Is there an alternate function I can use that has a smaller overhead?

A single draw call shouldn’t take 1 ms. I guess you are including other overheads into that 1 ms. Keep in mind that there is some small overhead every frame, even if you don’t draw anything.

No, I’m using Visual Studio performance profiler where I can see the exact method the overhead is coming from if I just comment the line with DrawIndexedPrimitives out the time it takes to execute the Draw method is completely negligible.

Actually it’s a bit under 1 ms and I’m drawing 3 times per frame because I have 3 tile maps, so it’s more like 0.25 ms per draw call, but it still seems excessive.

And what if you call DrawIndexedPrimitives twice in your draw function. Does it then go up to 2 ms? At 1 ms per draw call you could only draw 17 objects at 60 fps, which is way too little.

EDIT: as you said, even 0.25 ms per draw call is still too much. Is this the only draw call in your program? Maybe what you are seeing is some kind of fixed overhead for drawing and only the first draw call is slower. I can’t imagine it would scale that way once you add more draw calls.

Weird, the overhead is 0.9 ms in total both if I draw one time or three times, or 100 times. I have other draw calls with spritebatch, but the profiler breaks it down, so I can see how much these draw calls costs.

I tried benchmarking it with the StopWatch instead and it says 0,02 ms per draw call, which scales linearly with the amount of draw calls, so there must be something wrong with the diagnostics tool as you said.

In conclusion using a Vertex and Index Buffer is going to be 8-30 times faster than using SpriteBatch depending on how many tiles is within the camera bounds, the gain would be greater if the camera was zoomed out more (more tiles to render), which might come in handy at some point, the game Axiom Verge uses this technique to be able to zoom out on certain boss encounters for example. With this approach I can’t do camera culling, but I can use a Dynamic Vertex and Index Buffer and fill them with a great number of tiles around the player and update it when he moves out of an area, or if the tiles are modified somehow. This is similar to my previous approach where you draw tiles to a RenderTarget2D with SpriteBatch and just draw that as one big texture, which is very efficient, but it’s very slow to draw to the texture (about 100 ms), so when the player moves the camera too much and the RenderTarget2D has to be updated again you’ll experience a hiccup and lose a few frames. Using SetData on a Dynamic Vertex and Index Buffer is much faster in comparison.

By the way is there any way to control the draw order of the tiles in a single Draw call?

If it’s in a single draw call, you can simply sort them before filling up your index buffer.

Alternatively, or if you want draw order across draw calls, you can use a depth buffer and move tiles closer to or away from the camera (the z component if you’re using the same coordinate system as SpriteBatch).

1 Like