Anyone using the Task Parallel Library?

So I am curious if anyone in the MonoGame community (or any other .NET game dev community) has experience using TPL with their games. I love the ease and functionality of TPL, but haven’t really used it in the sort of context a game would (e.g. every frame creating and running tasks). Understandably the creation of garbage in that case is a concern.

There really doesn’t seem to be a whole lot of information out there on this subject. I did find something from a few years back, a question posed by one of MonoGame’s developers about pooling tasks. The general attitude seemed to be that, well the garbage will be short lived and not promoted, making GC stalls infrequent. The attitude of MonoGame (and also my attitude) is to try and be as garbage generation free as possible. I say this also after just reading the rather longish github issue thread on changing BoundingBox::CreateFromPoints to not use IEnumerable… :smile:

So while this may be a nice theory that creating a ton of tasks per update might not impact, I am interested if anyone actually has done this in practice and used TPL? Or have they tried, observed negative performance, and attempted to roll their own (nearly) garbage free thread pooling schemes?

I’ve been using some of the Parallel.For(…) / Parallel.ForEach(…) / and some of PLINQ’s .AsParallel().Unordered() stuff for my particle engine and it seems to run alright.

I haven’t done any tests yet, but I suspect it’s actually running slower (due to all the thread spin up / spin down / other thread pool overhead) than if it wasn’t multi-threaded. Either way, I’m at 60 fps / 60 lps – if there’s a way to tell Monogame to run as many logics per second as possible, I could benchmark the two and see what’s up. (I think there is, but I haven’t checked yet.)

Honestly, as far as GC goes, I’m not too worried about it – I think there’s more overhead in queuing up threads from the thread pool than there is in typical GC. (The GC in the modern desktop versions of .NET is pretty awesome actually, even for game dev. – compared to the GC in like Java or something, it tends to be a non-issue.)

One of the things I’ve been thinking about, is just using the built-in (pre .NET 4.0) libraries, and just spinning up several threads that are in an infinite loop (so you don’t need to pass in a delegate every time), and each thread just has data that it just owns, and just using old fashioned wait signals (AutoResetEvent for example). I feel like this would be more performant in the long term anyway… (it would definitely avoid the GC issues you mention, but more importantly, it wouldn’t have all the threadpool management overhead.)

Huh. What’da’ya know. I took a closer look, and here’s what I found:

I rigged something up so that my “Update” method would run as many times per second as possible without dropping my FPS, and using TPL is definitely faster for me. It doesn’t seem linearly faster compared to the number of cores I’ve got, but it’s over 2x faster, so that’s neat.

On my 4.0 GHz Quad Core i7 (with HT; so 8 “cores”), I get about 5,500 LPS with no parallelization.
With Parallel.ForEach, I’m drifting between 10,700 and 14,500 LPS. – Though, that starts getting my fan pretty loud (which is kind of hilarious).

Will try it at home later with a wimpier setup…

Edit: Note, the numbers above were in DEBUG mode. Derp. Ran it in release, and got closer to what I originally suspected – the game seems to run at about the same speed whether it’s Parallel or not… – I’m thinking there must be too much overhead in the TPL in release mode (also overhead in debug mode could explain the non-linear gains…).

Edit #2: Okay, I figured out a couple of things – I’m not really using a TPL Dataflow (turns out, I didn’t fully understand what the library was) – I’m just using the simple constructs like Parallel.ForEach, etc – secondly, I figured out that my big bottle neck was removing dead items from a List – I was removing the dead items by hand after every frame – and each removal required, potentially, a search of the entire list (i.e. “list.Remove(item);”), since it wasn’t being done by index.

I should probably just learn TPL or Akka.NET and figure out how those models work before going any further, but I had an idea I wanted to try out first, and it worked pretty well. (I’m getting more updates per second in the parallel version than the non parallel version – even in Release mode).

Basically, what I did was write my own GameList class as a kind of ConcurrentList (since there isn’t one in .NET) – essentially, it’s a read-only list that you can stage changes to, and then once per frame, when you’re at a good spot to lock the list, you can flush those changes into the list all at once.

The idea being that on a massive list, where each item has an Update() method, you want to parallelize those calls to Update(), and those calls to Update() should be able to change the contents of the list – the the contents of the list don’t change that drastically from frame to frame, so it’s okay to just stage them, and then flush them in at the end of your game loop once it’s safe.

This is how I designed my list:

  1. Adds / Removes are staged until a Flush is called.
  2. Add just takes an item, but remove takes an index; this is fine because:
  3. When you enumerate the list you get back KeyValuePair<int,T>s – where the key is the index, and the value is the item – this lets you always have access to the list index, even in a Parallel.ForEach… (so if an item wants to remove itself, or inspect one of it’s neighbors, etc., it can easily do that).
  4. The added items are stored in a ConcurrentQueue<T>() until the flush happens.
  5. The removed indices are stored in a ConcurrentDictionary<int, bool> until the flush happens (this is useful, because the keys are always guaranteed to be in order, so during the remove we just call RemovedItems.Keys.Reverse(), and we’re good to go, no pesky Sort or OrderBy calls).
  6. When a flush is called (once per frame), we lock the whole list, then we remove the items by index (RemovedItems.Keys.Reverse()), then we add the staged items.
  7. Only other thing of interest worth mentioning is the indexer – the getter just returns a regular item, but the setter locks the list before changing any values.