RenderTarget GetData best timing ?

Hello fellow devs,

I’m evaluating the use of RenderTarget.GetData to measure the light exposition of the player’s sprite in my 2D DesktopGL game.

Concept is simple :

  • Lights are sprites rendered in a RenderTarget, then mixed with the scene.
  • I have a specific RenderTarget very small ( 32x32 ) where the lights targeting the player are rendered.
  • Every 500ms, I render the players light in the small RT, then I request the RT data.

Now here’s my problem.
I’m testing this project on several machines with some puzzling results.

On 2 Linux machines :

  • one is a powerful machine with AMD CPU and Nvidia GPU
  • the other one is a amd graphics laptop quite weaker

Result is 8ms to get the RT data

On 2 Windows machines :

  • both have the same graphic card a GTX 960 and a descent intel CPU
  • W11 and W10

Result is 49ms on one, 33ms on the other, to get the RT data

Worth to know :

  • I also made tests on consoles where the result where under 10ms, IIRC.
  • Changing the RT size does not lead to significant performance changes
  • I tested getting the RT data during Update and during Draw, results are the same

As you can see, the differences are significant and the time needed to read the data is not compatible with a smooth playing experience on the 2 windows machines.

I’m definitely unsure of why would such differences are possible.

Now my questions :

  • Is there a “state of the art” approach to reading the data from the GPU ?
  • A specific timing in the game loop ?
  • A specific thread ?
  • Some wizardly memory alignment to set ?
  • Windows machines should be using DX rather than DesktopGL ?

I would like to be sure that I tried everything and did my best before having to implement another solution to my needs.

Thanks for any answers.

1 Like

Could you not reduce the size of the RT? Or do you need pixel acuracy?

I would probably not grab every n time span but every n frames.

Are you creating the RT each time? If yoy are dont, create it once at the size you need and re use.

Well, I already made thoses tests :

  • reducing the RT size did not change anything, i reduced it to 8x8 pixels
  • the RT is read approx every 500ms, it’s a timer bound to game loop, so basically it’s already every n frames
  • The render target is not created every time.

The code between acceptable 8ms and not acceptable 49ms is the same

If you are doing any form of threading it could be giving you timing issues.

Windows threading is amazing (not sure about linux) but as your hitting the GPU, there is a bottle neck there i suspected. Compute would be a way to go, not tried it with MG though.

Interesting, but no threading here, everything is done in the game loop.
So yeah, my main question is about this particular topic … knowing what would be the best way / timing to call GetData and ensure a kind of consistency in the expected duration of the operation

Forgive my ignorance, but how can you guarentee a call every half a second without threading?

It’s not accurate nor guaranteed, it’s an approximation using a timer inside the game loop.

Pulling data from the gpu can be slow and is not recommended. I believe the cpu will request the data, then wait until it recieves the data. I dont think its normal practises to do in a tight game loop

What do you need the rendertarget data for? If its for anything graphical can you create a shader to use it directly on the gpu.

If its only a 32 x 32 rendertarget your trying to get the data from, can you create something on the cpu that will generate the light level

Thanks …
I already know all this…
Most of the questions I’m asked are already answered in the first post.

Remember, my questions were more about being able to mitigate the time differences using best practice in the art of getting data from the gpu before having to fallback to CPU.

For example:

If I’m able to get the GPU data 8ms from a low end linux machine, why is it not the case on a significantly more powerfull windows machine ?
( also again, this is happening whatever the size of the render target. )

Does the low end machine have integrated graphics? I believe the integrated gpu and cpu share memory.

Im just guessing for ideas here though

Ah, that would have been an interesting path to follow indeed !

Unfortunately, the best performances are achieved on Linux, with one computer with integrated graphics and same perf on with a dedicated GPU.

I booted the “powerfull” linux machine with W10 to give it a try.

I get the RT data in less than 1 ms…

It’s not easy to make sense of all this.

Get a chart going and evaluate all the factors:

RAM type
RAM size
RAM speed

Integrated Graphics? [iGPU]
Dedicated GPU? [dGPU]

Is the iGPU sharing memory with a dGPU? is the dGPU using HostSharedMemory? there are so many factors in this regard actually… open task manager and view the activity and swap data…

Is the app running on the iGPU or the dGPU?

CPU age/perf

Temps CPU/GPU/Mobo/StorageDevice

Storage Performance SATA1-2-3/SATAIIIoverM.2/M.2NVME [Does the NVME have cooling? if so, Passive or Active?

What are the base clocks for the CPUs?

Then start graphing out the performances, loads, and set out timings, this should more or less help you identify why certain rigs are giving certain results and may help you isolate the anomalies…

Have you got a screenshot of the effect? someone may be able to give you a better method with a visual cue, which is if permitted copyright wise…

Good luck!

Have you tried the effect in DX for comparison?

Are you using transparency with that light effect?

I concur with the shader mention above…

Have you considered that the GPU’s Memory [A merely 2GB] might be a little dated?
Check the workload and swap in Task manager on W10/11… bad drivers are a huge factor of this too… pretty sure that GPU is no longer receiving modern drivers…

NVIDIA GeForce GTX 960 Specs | TechPowerUp GPU Database

You said a decent intel CPU, but that GT960 tells me it is an ‘old Decent Intel CPU’… besides the point, can you show the intent? I suspect you could use a loaded asset and reshape it as required using quads…

Long Story Short, Windows, use DX, anything else, err I guess OpenGL, but Apple is a huge question mark to be honest lol :joy:

Anyway, if you can, post a screenshot, I suspect you are trying a 2D platformer with a torch/flashlight?

Torch Knight by LT DEV (

Hope everyone is staying cool during this heatwave if you are in a country affected…

Good Luck!

Indeed my next move will be to test with DirectX.

All my light system is tested and working from previous games, so, it’s not about using shader or so, it’s about introducing a new logic : gathering the amount of light received at a defined position.

I could make it with CPU, but using the GPU data would be so much easier because all the lighting would have already been calculated.
Also, I’d like to understand what is the bottleneck here, call it curiosity :slight_smile:

In all honesty, I can’t mentally accept that a combination of a CPU, even a decade old, with a graphic card that i consider decent too, could take so much time bringing 8x8 pixels from GPU memory.

1 Like

I did point out the drivers being a factor…


More than one object?

Yeah you’re right and I agree, that’s why I’ll try directX to see if the drivers are better on “not so recent” hardware.
Funnily enough, i got better perfs on Iris Xe than on the GTX 960 card …

Nope, just the amount of light at a defined position = the player position. It’s really only one GetData call once in a while

Why would this be funny? Xe is cutting edge…

I still don’t understand what you are trying to achieve… if it is a combination of values from light sources, why not calculate it based on light sources within an area? that way you can just use a contact square…

Anyway, hope you find a solution to this issue… :sake:

Well, I had the impression it was kind of sub-par gfx chipset, looks like It was a wrong impression.

That’d be the CPU solution that I planned to implement, but I would have loved to be able to grab the same info, computed on the GPU with the same logic as the rest of my lighting.
As I said, this question is more to see if a GPU solution is possible before falling back to CPU.

1 Like

Have you looked at shaders yet? as mentioned above.

For some reason lot of people decided to contribute here while sadly lacking the knowledge, creating unnecessary confusion.
Getting these data can be and generally is very fast, just needs to be done correctly. Rule of thumb is to have few frames (three) delay between copying to staging buffer and pulling data to cpu to prevent stall. This will require bit of poking in source code to achieve.

However I have way better alternative for you (just remember what I wrote above, you will need it at some point in game dev). Grab Compute branch of MG and do this in compute shader, you can do it per frame without issues, measure exposition there, utilize result as you see fit.