Fast Tile Engine

Beckett_Firestone · June 16, 2020, 10:47pm

I’m starting a tile engine in Monogame, and as somebody who doesn’t understand the way memory works under the hood, I was just wodering, for performance sake: Would I be better off storing tiles as c# objects, or have their fields/properties in arrays. For example sprites[12] and tiles[12] would correspond, and I can do all the logic and drawing from one class. How much faster would that actaully be (int terms of having ~10,000 tiles)? Thanks in advance.

MrValentine · June 17, 2020, 1:46am

This comment was funny lol:

Apologies for him using the old MSPaint…

Here’s a longer and more relative one:

This one doesn’t answer the question maybe but eh:

boot · June 17, 2020, 3:24am

I wouldnt use multiple arrays for storing different properties.

For maximum memory efficiency you would have a reference list of pregenerated tiles. So if you had less than 256 kinds of tiles you could have 255 pregenerated tiles and have them in an array by index.

You could then have your grid to be a 2d array of bytes. Each byte is a number under 256 and only takes up 1 byte. So making a map with millions of tiles will be very small

If your expecting more than 255 tiles another alternative is have your pregenerated tiles. Then have your 2d array to be an array of tiles. Each tile reference will then take up 4 bytes.

This is the cheapest solution memory wise. You dont have to store individual tiles or locations. For drawing your tiles you know your location because of the location in the 2d array. You can pass thru this x and y in the tiles drawing method and work out a draw location based on the x, y and tilesize.

willmotil · June 17, 2020, 4:57am

Some of the terminology terms in these videos are wrong such as calling a reference a pointer ect.

other then that good find.

Cory_Kroll · June 17, 2020, 4:57am

Keep in mind, too, that drawing your entire tilemap doesn’t need to be done every frame. You could do it once per level and store it in memory (a Texture2D perhaps) and then simply copy the part that is viewable each frame from the already rendered texture.
When you go between levels, you can dispose of the tile map rendered texture and load a new one while showing a clever animation - like perhaps an elevator animation

Cory_Kroll · June 17, 2020, 5:00am

Monogame Extended does have a TileMap renderer, and that project is open source. It might be worth checking out for ideas.

willmotil · June 17, 2020, 5:44am

There is a couple examples here on different ways to draw a tilemap depending on the constraints of the particular situation or game ect…

Personally i like nkasts suggestion if it can encompass all you’re requisites then i think its a good way to go.

Thomas_Vantroyen · June 17, 2020, 8:24am

If you have multiple fields belonging logically together, you should put them together. If you want a list of them to be in ram in a contiguous block, you van use a struct instead of a class (public struct Tile { … }) . An array of structs will be stored as one contiguous block in ram. This is how most ECS implementations work: store the data in arrays of structs and have a single “system” apply logic to all of them in a very efficient for-loop.

This has nothing to do with stack vs heap though, and everything with cache missing vs hitting. If your tiles are classes, then an array of Tile will be an array of references to random other places in ram where those individual tiles are stored. Now, when a cpu needs to work on a piece of memory it fetches quite a large block of ram (around the part you need) into cpu cache before working on it. So, when you are looping over all of your tiles, you will be loading many pieces of different ram from all those random places where your tiles are stored into the cpu’s cache, which is very time consuming. But when you use an array of struct, all your tiles will be next to each other in ram, so, when fetching a chunk of ram into cache, MANY of your tiles will be in that single piece of ram, so you can reuse that cache-fetching effort for many of your tiles which is a LOT faster.

How much faster vs each tile its own object with method calls per tile? Hard to say in numbers, but i guarantee you “a lot faster”

So, in short, for high performance of large collections of simple game objects:

use struct instead of class
use a separate class for the logic with a for-loop

boot · June 17, 2020, 2:28pm

Ive always understood that you should never use a struct if it’s going to be more than 16 bytes or if its mutable.

Structs can also cause performance slow downs as you will make copies of them if you pass them around.

But i think if the OP is only drawing his tiles on the screen, looping through a few hundred structs as opposed to classes is not really going to gain any performance. The bottlenecks will always be the drawing code thats executed on the tiles.

LithiumToast · June 17, 2020, 3:32pm

you should never use a struct if it’s going to be more than 16 bytes or if its mutable

Why? Did you know that Matrix is a struct and it is more than 16 bytes?

can also cause performance slow downs as you will make copies of them if you pass them around

Did you know you can pass structs by reference using the ref keyword?

looping through a few hundred structs as opposed to classes is not really going to gain any performance

Not true. In C# the memory for classes is always stored on the heap. Thus the memory for an array of classes is not guaranteed to be one block of memory. I recommend you read this: Data Locality · Optimization Patterns · Game Programming Patterns

boot · June 17, 2020, 4:06pm

c# - Why should a .NET struct be less than 16 bytes? - Stack Overflow.

Then if your going to pass things around by reference use a class. There is no benefit for him using a struct. Structs should be mutable. The main benefit to structs are garbage collection, but i doubt the OP needs to garbage collect his tiles.

Arrays of structs can end up on the heap

I just ran a test.
Looping through an array of 100 000 structs and calling an empty method 100 times and getting the average. 0.3842 milliseconds
Looping through an array of 100 000 classes and calling an empty method 100 times and getting the average. 0.4235 milliseconds

The difference is absolutely miniscule. The OP will only ever be drawing a few hundred tiles at a time on the screen. If he wants to optomise performance he should be doing things like manual sprite sort mode, minimising texture changes and minimising render target changes.

Thomas_Vantroyen · June 17, 2020, 5:16pm

I think you are confusing some concepts. Yes, passing structs along on the stack from method to method is copying more bytes than a reference would be. But this is not what is the usecase here. Here, we are talking about storing large amount of gamedata in contiguous memory blocks and reading them all, each frame.

And besides, as LithiumToast says, if you want to pass structs in the array to methods, you can pass structs by ref, which is not the same as using a reference to a class instance. A ref is a concept where the CLR keeps track of the original memory location where the struct is located. So a local ref variable in a method scope can write to the struct directly without losing the benefit of less cachemissing because the structarray is a contiguous block in ram.

With classes, you are always stuck with the double lookup step: read reference (address) in array, read referenced location in ram). This is something that has always been the default approach in high performance C++ programs and since the introduction of ref structs, .NET can now do this too.

BlizzCrafter · June 17, 2020, 6:51pm

Just wanted to add that to the topic because it could be of interest:

willmotil · June 17, 2020, 8:50pm

It’s 48 bytes after that you start getting a penalty that begins to increase in a non linear fashion.
Under 16 bytes its better then linear time.

I didn’t run this test someone else did but it seems to back up what others have said.

Here is the result of a test that I did on my 64 bit system copying structs half a billion times:
struct 4    : 272 ms.
struct 8    : 235 ms.
struct 16   : 317 ms.
struct 32   : 625 ms.
struct 64   : 1280 ms.
struct 128  : 4659 ms.
struct 256  : 8020 ms.
As you see, below 16 bytes the time is not linear, although 16 bytes is four times as much as 4 bytes, it doesn’t take four times longer. Above 16 bytes the time is linear, so doubling the size double the time. That’s where it would start using multiple moves. Above 64 bytes there is a jump, where the time suddenly quadruples when the size doubles. That’s where the fallback would start to be used.

The way i understand it is
Their is a 16 byte cost for the instruction then 16 bytes of data on a 32 bit program.
On a 64 byte processor its 64 - 16;

It’s not a super big deal most of the time.
It really doesn’t have anything to do with getting a contiguous block of memory either.
That depends on how when and were you define and declare things primarily.

Personally if you need a critical array block this is just from experience the earlier you allocate a huge chunk of memory in your program the better chance you will get it.

So basically in your constructor define a full array or a list with Capacity and initialize it with dummy values right on the spot.
Yes it’s not guarenteed for heap stuff because you might have heap framentation, but the earlier you do it the better the chance you will get it in a nice way how you like it.

One more note… any array or collection containing over 10,000 index’s goes on the LOH. That is actually noted in the CLI specs.

LithiumToast · June 17, 2020, 8:55pm

Even with your linked Stackoverflow question, the top answer says “It is just a … rule of thumb” and “If you need a struct’s behavior, then I would say to make it a struct, no matter the size.”

And yes a struct can be allocated on the heap. But this is not about heap vs stack. This is about data locality; the total memory for an array of structs is guaranteed to be in one contiguous block of memory. This means that when you loop over the elements you are not chasing “pointers” and thrashing the cache.

The difference is absolutely minuscule.

Your test is not a game. “Miniscule” x100 or even x1000 becomes a problem. Also, testing these things can be hard because of a lot of factors such as debug vs release, different JIT compilers, etc.

Again, I highly recommend you read the data locality chapter I linked above.

MrValentine · June 17, 2020, 9:33pm

I double this, that book is fantastic! I have it on my bookshelf and in my kindle collection.

Arcadenut · June 17, 2020, 10:48pm

I would recommend this book:

https://www.amazon.com/Zen-Code-Optimization-Ultimate-Software/dp/1883577039#:~:text=Zen%20of%20Code%20Optimization%3A%20The%20Ultimate%20Guide%20to%20Writing%20Software,Limit%20Paperback%20–%20December%208%2C%201994&text=Find%20all%20the%20books%2C%20read%20about%20the%20author%2C%20and%20more.&text=Provides%20unique%20coverage%20of%20Intel’s,aspects%20that%20affect%20code%20performance.

Learn the right way to optimize…

Dismiss · June 23, 2020, 7:14pm

Struct tiles are a real pain to work with though.

I found id rather have a class and not have to deal with pass by ref arguments cluttering up all my apis, or having to resolve ‘tileid’ to a tile repeatedly … also you can’t really make use of interfaces with a struct or everytime it is boxed you generate garbage.

Also, collecting a list of tiles (something that happens a lot, in tile based games) is enormously easier if its a class and collecting tiles you can then directly work with, rather than a unique and expensive copy which you then have to still remember to apply to the ‘real’ tilemap.

One extremely handy feature of class type tiles is, since you can hold references to other tiles, you link your neighbors to each other, which makes certain things a lot either to do, like pathfinding.

poohshoes · June 24, 2020, 3:46pm

There is a lot of good and bad advice in this thread and I think the best solution is to make something and then run a profiler on it to see how fast it is and make changes from there to see how they affect the performance.