Huh. What’da’ya know. I took a closer look, and here’s what I found:
I rigged something up so that my “Update” method would run as many times per second as possible without dropping my FPS, and using TPL is definitely faster for me. It doesn’t seem linearly faster compared to the number of cores I’ve got, but it’s over 2x faster, so that’s neat.
On my 4.0 GHz Quad Core i7 (with HT; so 8 “cores”), I get about 5,500 LPS with no parallelization.
With Parallel.ForEach, I’m drifting between 10,700 and 14,500 LPS. – Though, that starts getting my fan pretty loud (which is kind of hilarious).
Will try it at home later with a wimpier setup…
Edit: Note, the numbers above were in DEBUG mode. Derp. Ran it in release, and got closer to what I originally suspected – the game seems to run at about the same speed whether it’s Parallel or not… – I’m thinking there must be too much overhead in the TPL in release mode (also overhead in debug mode could explain the non-linear gains…).
Edit #2: Okay, I figured out a couple of things – I’m not really using a TPL Dataflow (turns out, I didn’t fully understand what the library was) – I’m just using the simple constructs like Parallel.ForEach, etc – secondly, I figured out that my big bottle neck was removing dead items from a List – I was removing the dead items by hand after every frame – and each removal required, potentially, a search of the entire list (i.e. “list.Remove(item);”), since it wasn’t being done by index.
I should probably just learn TPL or Akka.NET and figure out how those models work before going any further, but I had an idea I wanted to try out first, and it worked pretty well. (I’m getting more updates per second in the parallel version than the non parallel version – even in Release mode).
Basically, what I did was write my own GameList class as a kind of ConcurrentList (since there isn’t one in .NET) – essentially, it’s a read-only list that you can stage changes to, and then once per frame, when you’re at a good spot to lock the list, you can flush those changes into the list all at once.
The idea being that on a massive list, where each item has an Update() method, you want to parallelize those calls to Update(), and those calls to Update() should be able to change the contents of the list – the the contents of the list don’t change that drastically from frame to frame, so it’s okay to just stage them, and then flush them in at the end of your game loop once it’s safe.
This is how I designed my list:
- Adds / Removes are staged until a Flush is called.
- Add just takes an item, but remove takes an index; this is fine because:
- When you enumerate the list you get back KeyValuePair<int,T>s – where the key is the index, and the value is the item – this lets you always have access to the list index, even in a Parallel.ForEach… (so if an item wants to remove itself, or inspect one of it’s neighbors, etc., it can easily do that).
- The added items are stored in a ConcurrentQueue<T>() until the flush happens.
- The removed indices are stored in a ConcurrentDictionary<int, bool> until the flush happens (this is useful, because the keys are always guaranteed to be in order, so during the remove we just call RemovedItems.Keys.Reverse(), and we’re good to go, no pesky Sort or OrderBy calls).
- When a flush is called (once per frame), we lock the whole list, then we remove the items by index (RemovedItems.Keys.Reverse()), then we add the staged items.
- Only other thing of interest worth mentioning is the indexer – the getter just returns a regular item, but the setter locks the list before changing any values.