A silly question.

Does anyone see why is this about 20% slower then the regular sprite batch when im not even switching textures. Im trying to figure out what the magic is within spritebatch and friends that im missing. I poped in pointer indexing and that only got me like 10 more fps at the cost of using unsafe so i removed it.

Also why in the gl version would going from a smaller window to a maximized window make the frame rate incredibly slower. Am i the only one seeing this ? In one extreme test case, i got a drop from 1500 fps to 150 fps just enlarging the window to close to be maximized its the weirdest thing. I see it as well with regular spritebatch a huge drop in fps.

        public void DrawText
            (
            StringBuilder text, Vector2 position, Color color,
            float rotation, Vector2 origin, Vector2 scale,float depth
            )
        {
            unchecked
            {
                //precalculate
                var firstGlyphOfLine = true;
                var currentGlyph = defaultGlyph;
                var lineHeight = (float)tsf.LineSpacing * scale.Y;
                var Spacing = tsf.Spacing;
                Vector2 offset = Vector2.Zero;
                Vector2 q = new Vector2((float)(Math.Sin(rotation)), (float)(Math.Cos(rotation)));
                char c;
                for (var i = 0; i < text.Length; i++)
                {
                    c = text[i];
                    if (c == '\r')
                        continue;
                    if (c == '\n')
                    {
                        offset.Y += lineHeight;
                        offset.X = 0;
                        firstGlyphOfLine = true;
                        continue;
                    }
                    if (_glyphs.ContainsKey(c))
                        currentGlyph = _glyphs[c];
                    else
                        currentGlyph = defaultGlyph;
                    // calculate and scale
                    if (firstGlyphOfLine == false)
                        offset.X += Spacing + currentGlyph.LeftSideBearing * scale.X;
                    firstGlyphOfLine = false;
                    Vector2 localDrawXY = new Vector2(currentGlyph.Cropping.X, currentGlyph.Cropping.Y) * scale + offset;
                    Vector2 localDrawRB = localDrawXY + new Vector2(currentGlyph.BoundsInTexture.Width * scale.X, currentGlyph.BoundsInTexture.Height * scale.Y);
                    offset.X = (currentGlyph.Width + currentGlyph.RightSideBearing) * scale.X + offset.X;
                    
                    // prep vertices to directly draw
                    Vector2 lt = new Vector2(localDrawXY.X, localDrawXY.Y);
                    Vector2 lb = new Vector2(localDrawXY.X, localDrawRB.Y);
                    Vector2 rt = new Vector2(localDrawRB.X, localDrawXY.Y);
                    Vector2 rb = new Vector2(localDrawRB.X, localDrawRB.Y);
                    // rotate
                    if (rotation != 0)
                    {
                        lt = new Vector2(lt.X * q.Y - lt.Y * q.X, lt.X * q.X + lt.Y * q.Y);
                        lb = new Vector2(lb.X * q.Y - lb.Y * q.X, lb.X * q.X + lb.Y * q.Y);
                        rt = new Vector2(rt.X * q.Y - rt.Y * q.X, rt.X * q.X + rt.Y * q.Y);
                        rb = new Vector2(rb.X * q.Y - rb.Y * q.X, rb.X * q.X + rb.Y * q.Y);
                    }
                    // translate
                    lt += position;
                    lb += position;
                    rt += position;
                    rb += position;
                    // project
                    var LT = new Vector3(lt.X * cw - 1f, lt.Y * -ch + 1f, depth);
                    var LB = new Vector3(lb.X * cw - 1f, lb.Y * -ch + 1f, depth);
                    var RT = new Vector3(rt.X * cw - 1f, rt.Y * -ch + 1f, depth);
                    var RB = new Vector3(rb.X * cw - 1f, rb.Y * -ch + 1f, depth);
                    // uv coordinates in screen space
                    float uvL = (float)currentGlyph.BoundsInTexture.Left * cu;
                    float uvR = (float)currentGlyph.BoundsInTexture.Right * cu;
                    float uvT = (float)currentGlyph.BoundsInTexture.Top * cv;
                    float uvB = (float)currentGlyph.BoundsInTexture.Bottom * cv;
                    Vector2 uv0 = new Vector2(uvL, uvT);
                    Vector2 uv1 = new Vector2(uvL, uvB);
                    Vector2 uv2 = new Vector2(uvR, uvT);
                    Vector2 uv3 = new Vector2(uvR, uvB);
                    // create the vertice quad
                    spriteVertices[vi_pointer + 0] = new VertexPositionColorTexture(LT, color, uv0);
                    spriteVertices[vi_pointer + 1] = new VertexPositionColorTexture(LB, color, uv1);
                    spriteVertices[vi_pointer + 2] = new VertexPositionColorTexture(RT, color, uv2);
                    spriteVertices[vi_pointer + 3] = new VertexPositionColorTexture(RB, color, uv3);
                    // create the indexs
                    //
                    // LT 0   2 RT
                    //    |  /|     Triangle 1 is 0 1 2  ccw
                    //    | / |     Triangle 2 is 2 1 3  ccw
                    // LB 1   3 RB
                    triangleIndexs[ti_pointer + 0] = 0 + vi_pointer;
                    triangleIndexs[ti_pointer + 1] = 1 + vi_pointer;
                    triangleIndexs[ti_pointer + 2] = 2 + vi_pointer;
                    triangleIndexs[ti_pointer + 3] = 2 + vi_pointer;
                    triangleIndexs[ti_pointer + 4] = 1 + vi_pointer;
                    triangleIndexs[ti_pointer + 5] = 3 + vi_pointer;
                    // increment the counts and indexs
                    currentTriangles += 2;
                    vi_pointer += 4;
                    ti_pointer += 6;
                    // check capacity
                    if (currentTriangles >= triangleCapacity - 2)
                    {
                        IncreaseCapacity();
                    }
                }
            }
        }

Well here is what i’ve been working on while my internet was down. Wrapped text directly bypassing spritebatch and draw via drawuserindex primitives and directly using the loaded spritefont.
Though this is still just a prototype its basically a fancy version of the above with a small word buffer class helper.

Taking a screenshot creates garbage i guess.

Im cheating here with alpha blending though this would break on separate calls. I still haven’t made a shader to do it properly this is just using basiceffect.

These all align to or can align to spritebatch kerning i already tested it, i get per pixel alignment. Though i yanked out sprite effects because the more i reflect on it the more i just think its stupid and ugly.

but ya i really can’t see why that first method would be slightly slower then spritebatch i really expected it to be faster as it is doing less overall and is compacted.

Well, from a first look,
-You always calculate sin/cos which are very expensive ops. SpriteBatch skips this calculation when rotation is 0. You can move it in the if (rotation != 0) block.
BTW, Nice touch you called it q (-uaternion?)! (well, technically it’s a complex number since it’s confined to just one plane but whatever)

-Too many temp variables and memory moving around. Memory access are the bottleneck on modern CPU’s. Cache misses can be as expensive or worst than Muls/DIVs/Sin,etc.
Try to write directly to spriteVertices.Position/Color and avoid constructing new Vector2 and Vector3 structs.
But then, using a fixed pointer would improve accessing spriteVertices items multiple times. If you don’t like fixed pointers an alternative would be to have a method and pass a ref VertexPositionColorTexture.

Compared to the current develop branch?

PR #5401 is a lot faster than that. and if you combine it with #5453 is nearly x2 faster than the current develop in the case of DrawString(SpriteFont, string, Vector2, Color, float, Vector2, Vector2, SpriteEffects, float).
Also MichaelDePiazzi pointed out here that we should precalculate Glyph’s texture UV, I suppose we can get another 30-50% improvement by that. I will get implement that as soon as #5401 get’s merged.

So, you could use #5401 as a starting point and remove anything you don’t want (SpriteEffects).

Dropping it into the for loop is much worse when there is rotation and causes a noticeable hit since the sin cos only needs to be precalculated i pulled it out of the loop. I did however wrap it in the pre calculation with a Vector2 q = Vector2.Zero if(…) which has very little impact it seems under rotation and should get skipped when there is none now. A long time ago i used look up tables for sin cos but i doubt that would be faster nowdays.

q (-uaternion?)

I like to use r as shorthand for result, So it’s sort of how i keep it in mind r = p * q + o (result position rotation origin) t for translation s for scale. Ya q can also represent quaternions. Plus it looks nice… so it’s all the easier to remember.

-Too many temp variables and memory moving around. Memory access are the
bottleneck on modern CPU’s. Cache misses can be as expensive or worst

Im thinking this is whats going on. Ill try to implement that range of advice.

than Muls/DIVs/Sin,etc.

I tend to pre-calculate all my divisions into coefficients before dropping them into any loops. The same would apply for the texture uv’s here i think.

Also MichaelDePiazzi pointed out here1 that we should precalculate Glyph’s texture UV,

On load or when a windows resize occured i thought the same thing (though i didn’t implement it here) to make my own glyph class that has a extra set of 4 uv members. take the coefficient of the screen width and height and multiply it by the bounds and create the uv’s so the glyph holds the uv directly.
Which is pretty much what cu cv are doing (inefficiently) in the function

basically
uvLT.X = bounds.X * (1f / image.Width);
cu cv in my code is just that 1f / image.Width and 1f / image.Height
so basically if you just did that on load or when the window resized in the glyph its just sitting in the glyph to be taken out. glyph.uvLT;

I don’t really see the point of requiring performance while the user is resizing the window a slight hiccup at that time, is not only passable, its typically expected. It’s the perfect and correct time to recalculate the coefficient for the texture and the uv’s to the glyphs. Then you just pass them straight in from the glyph its not like the uv changes for the glyph at any other time, since each spritefont has its own texture.

Though note you still need the actual pixel value of the bounds here for rotation. So it needs a extra set of fields i don’t think you can just replace bounds.

Vector2 localDrawXY = new Vector2(currentGlyph.Cropping.X, currentGlyph.Cropping.Y) * scale + offset;
Vector2 localDrawRB = localDrawXY + new Vector2(currentGlyph.BoundsInTexture.Width * scale.X, currentGlyph.BoundsInTexture.Height * scale.Y);
offset.X = (currentGlyph.Width + currentGlyph.RightSideBearing) * scale.X + offset.X;

// prep vertices to directly draw
Vector2 lt = new Vector2(localDrawXY.X, localDrawXY.Y);
Vector2 lb = new Vector2(localDrawXY.X, localDrawRB.Y);
Vector2 rt = new Vector2(localDrawRB.X, localDrawXY.Y);
Vector2 rb = new Vector2(localDrawRB.X, localDrawRB.Y);
// rotate
if (rotation != 0)
{
lt = new Vector2(lt.X * q.Y - lt.Y * q.X, lt.X * q.X + lt.Y * q.Y);
lb = new Vector2(lb.X * q.Y - lb.Y * q.X, lb.X * q.X + lb.Y * q.Y);
rt = new Vector2(rt.X * q.Y - rt.Y * q.X, rt.X * q.X + rt.Y * q.Y);
rb = new Vector2(rb.X * q.Y - rb.Y * q.X, rb.X * q.X + rb.Y * q.Y);
}

Also origin is nothing though xna used it against the uv to alter the texel position, you can easily just use it against the position directly. which is what i’ve done ideologically here. Which turns it into simple addition subtraction.

Vector2 localDrawXY = new Vector2(currentGlyph.Cropping.X, currentGlyph.Cropping.Y) * scale + offset - origin;
... rotations code ...
lt += position + origin;
lb += position + origin;
rt += position + origin;
rb += position + origin;

However this has ‘no meaning’ in the context of drawstring without spriteffects around. So i actually totally removed it :slight_smile: which is perfectly fine. As the position then IS the origin in all cases, in the context of a drawstring. (its only in there because i was lazy and didn’t yank it out yet in fact i just noticed i forgot to)
SpriteEffects however are horrible as the context of the origin becomes unknowable without a pre-determination of the size, which of course must be fully calculated in sequence.

Have you taken a look at charsource at all ?

Does this actually do anything to improve speed or anything at all. Or is it just a redundant and useless attempt to hack around c# value to char garbage generation. ? I still see no reason for why its in there.

UE4 still uses LUTs for… For ex, a texture2d to store roughness on Y, and cos(theta-v) on X. Instead of computing this on the gpu.
I also use LUTs and it gives better results when a lot of calculs a required (i use them in threads to parallelize movements computations)

Interesting but here its just two calls one to sin one to cos and even then that is conditional.

If called each frame it can impact the performances: if you calculate q for each call of
DrawText()
as it seems to be the case as I don’t see any if/switch, or/and this keeps rotating between each frame, then sin/cos can cause an overhead. LUTs for sin/cos are still faster than calling the methods, unless you need billions of values for high precision, the search could be the bottleneck.
But enlarging the window would not be sort of proportionnal to the fps loss regarding sin/cos.
Maybe if you draw the text in a rendertarget (if not changing betweenframes) and rotate it, you could gain some perfs.