Game.EndDraw take 77% of draw time

Alkher · May 22, 2017, 2:59pm

How many objects/materials have you each frame ?

I have a nvidia620 and in cases with very numerous objects, and numerous materials, it drops FPS to 10 or below.
On a 870M it runs fine at 120fps. Maybe use a “lower” profile when on the 540: in my engine, when coding with the 620, i use lower specs (LOD, number of details etc aggressively lowered) to not wonder about that.
But it helps to see what happens when a user has a low end computer and test to reproduce their problems, or better, improve algorithms and optimize code.

Payet_Romain · May 22, 2017, 4:22pm

My test scene have 266 models.

A computer have in average 12 models, each models have 3 to 10 materials (5 in average).
Only motherboard and certain graphic card have more than 6 materials
The most common: Drive, Ram and Fan have only 2 or 3 materials

All textures are atlased in the same file and all normal map as well.
…

On the GTX1080 I can run this scene at 66fps with a 2160p resolution
On the GT540M I have 8 to 10 fps in 720p

Alkher · May 22, 2017, 4:31pm

Did you do a profiling with a tool to narrow the bottleneck?
It could be the way you do culling, sort, etc

Payet_Romain · May 22, 2017, 7:53pm

Yes that is what I try to do right now.

if I scope the profiling on Game.Draw i got this:

This profiling has been done on my desktop computer.

Effect127 is the effect used on all my models.

WorldModel is a class I use for containing Model,position,rotation, hitbox, etc… for each object.

3.1% WorldModel.EffectParam(…) apply the material, world, view ect to effect.

1.3% WorldModel.DrawMesh(…) is a slightly modified version of Microsoft.Xna.Framework.Graphics.ModelMesh.Draw()

5.7% WorldModel.Refresh(…) is basically this in pseudo code:

foreach( mesh in this Model) {
GetTheMaterial
ApplyMaterialAndMatrix // with WorldModel.EffectParam(…)
DrawMesh // with WorldModel.DrawMesh(…)
}

5.4% Game_Bank.GetTexture return one of the two texture showed in my previous post with coordinate of the needed area.

Alkher · May 23, 2017, 6:48am

Some ideas coming…
Do u set effect parameters even when they havent not changed?
Do u cache them or call them with parameter[thestring]?
Have you tried sorting the meshes by material to lower the calls?
Maybe MRT could help by drawing the 6 materials in one for loop. And then combining the results.
Or 2 loops of 3 if the hardware does not support up to 8

Payet_Romain · May 23, 2017, 12:31pm

I cache all parameters like this:

Dictionary<string, Vector3> _cachedVector3 = new Dictionary<string, Vector3>();

void SetValue(string key, Vector3 value) {
if (this._cachedVector3[key] == value) return;
//
this._cachedVector3[key] = value;
this.Parameters[key].SetValue(value);
}

I have the same code for Matrix,Vector4, float ect.

do you mean sorting meshes in each models or in the entire scene ?
I don’t sort them actually but it’s a good idea.

I don’t use MRT do you have some link where I can learn more about MRT ?

Payet_Romain · May 23, 2017, 3:53pm

Ok I have profiling Game.DoDraw on the laptop.

Apparently OpenTK.Platform.Windows.WinGLContext.SwapBuffers() (from gdi32full.dll) is drastically slower on the 540M.

I think this is what @nkast said earlier about CPU waiting the GPU before swapping the buffer.

But I don’t understand something… for testing purpose i have disable many thing in my shader + divide all texture and normal size by 3. With the GTX1080 i got 90fps but on the 540M still got less than 10fps no gains at all occurred.

Alkher · May 27, 2017, 2:56pm

With my engine, on a gt620 I get “only” 45fps when I get 130fps on a 670M for the exact same scene… So I guess generations before 6xx does not seem to be good at dx11, whereas it was fine with XNA 4 (dx9)

Payet_Romain · May 27, 2017, 9:16pm

I actually use OpenGL but the gap between this two generation is interesting.

MrValentine · June 5, 2017, 8:57am

What are the dimensions of those texture atlas’?

Payet_Romain · June 9, 2017, 2:11am

the atlas is about 6000x3000 but like I say earlier even if I divide all texture and normal size by three, around 2000x1000, I still get the same FPS

Alkher · June 9, 2017, 6:33am

One thing i thought about: what profile are you using?
Caching parameter like this after the effect has been loaded could help

EffectParameter myparam1 = myeffect. Parameters["nameofparaminthehlsl] ;
EffectParameter myparam2 =…

To use it in update or draw:

myparam1.SetValue(thevalue)

About Multi Render Targets i dont have bay tutorial, websites from my learning times are now closed but there must be some remaining tutorials on riemers’ website
Whichever tutorial you choose, use Rendertargetbinding to make things a little faster.

Payet_Romain · June 13, 2017, 9:05am

I have tried Reach and HiDef but i have got the same performance.

Why caching EffectParameter like this is better ?

I don’t really understand how render at multi target can be useful in my case.

I thought multi render target are use only for multi point of view or am I wrong?

Jjagg · June 13, 2017, 9:13am

EffectParameters are looked up every time if you don’t cache them.

There’s always the possibility of a driver issue. Are your drivers up to date?

Payet_Romain · June 13, 2017, 9:20am

I cache the value like I say in this post : Game.EndDraw take 77% of draw time

Yes all drivers are up to date.

Alkher · June 14, 2017, 6:31am

Only your key(s) is(are) cached, in a dictionary, which has lookup too, you make the work twice. ‘caching’ the value of the key of the parameters has a lookup on the array you traverse each time you need the value to be set.
But you must be loosing about 5 or a little more fps, not so noticeable.

MRT should be the way to go as you need many renders of the scene with different methods.

Payet_Romain · June 15, 2017, 11:10am

You were right i removed my Dictionary and use EffectParameter instead.
I got +4fps on the desktop platform and +1fps in average on the laptop.

I have got an idea but is kind of difficult to implement…

I means… I have many computer with multiple mesh and material in my game but not all of them are used at the same time.

do you think that would be great if I take unused computer in a background thread to combine the mesh of each component by merging the vertexbuffer and set the offset of the uvmap directly in one new model?

If i do this correctly all unused computer will have only one ModelMesh with one ModelMeshPart and should use only one draw call or two for transparent part.

I don’t really know how to create a entire model on the fly, I try some workaround without success yet.

Is there a correct method to combine ModelMesh or ModelMeshPart ?

Alkher · June 15, 2017, 11:19am

Do you display all items in the scene or only the visible ones ? I’m not sure to understand.
If you need culling, it is easy: create a BoundingFrustrum from your camera, and for each item you need to display, check if any part/item is visible or not for each one:

ContainmentType _CtObject = _BoundingFrustum.Contains(this._BoundingBox);

if(_CtObject != ContainmentType.Disjoint)
{
//At least it is intersecting one of the camera's 6 planes, or all included
//So draw this ! (apply effect params etc)
}

If you want to go the MRT way, I think reading this tutorial for deferred rendering (which uses MRT) can help you:
http://homepage.lnu.se/staff/tblma/Deferred%20Rendering%20in%20XNA%204.pdf
Essentially from the section “Rendering to Multiple Render Targets”

Payet_Romain · June 15, 2017, 12:01pm

I already culling object that not in view with BoundingFrustum.

What i want to try to do is merge the multiple ModelMesh of a computer in one ModelMesh.
Basically take all the vertex and put them in one vertexbuffer to be able to draw the entire computer in one drawcall.

But I couldn’t find anything useful about combining models.

Thank for the link, I will read that.

Alkher · June 15, 2017, 12:50pm

Hum. That could be possible but a little complicated (respect of the indices => face orientation => culled by the rasterizer).
You can try but i’m not sure you will gain a lot of performance. MRT will be faster. (and lead you to postprocessing effects if needed)
But you could also try to cull ModelMesh/ MeshParts instead (Building their boundingboxes at loadtime or better with a custommodelprocessor)