Efficiently drawing many rounded rectangles

magneticmuesli · October 28, 2020, 5:56pm

Is it possible to get the width and height of the current “fragment” in a pixel shader?

I’m writing a pixel shader that can draw rounded rectangles. My problem is I need the size of the rectangles in the shader in order to calculate the roundness of the corners properly.

I could of course create a global variable in the shader that I update with the size information for every rectangle I want to draw but that would mean I can’t combine the vertices of multiple rectangles and draw them all in one batch, right?

…or am I misunderstanding something? My understanding of the graphics pipeline is limited.

Thanks!

EDIT:
The original title of this thread was 'Getting the size of the current fragment?" now updated to reflect what I’m actually trying to achieve.

GoldenThumbs · October 28, 2020, 9:42pm

Without getting too technical, a fragment is more or less analogous to a pixel. From the sounds of it you want something like a uniform that passes the size of your rectangle in pixels. Am I correct?

In case you actually are looking for the fragment position, GLSL has a built-in variable for it called “gl_fragCoord”. Perhaps looking for the HLSL equivalent online can help?

Edit: This may be of help https://docs.microsoft.com/en-us/windows/uwp/gaming/glsl-to-hlsl-reference

magneticmuesli · October 28, 2020, 10:36pm

Aaah thanks for your reply and please excuse my ignorance!

So fragments aren’t what I’m looking for then… (is gl_fragCoord analogous to SV_Position?)

In my understanding a pixel shader (that sits in tandem with a vertex shader) is not necessarily processing every pixel in the screen but the ones corresponding to the output of whatever the vertex shader feeds it?

For instance say if my screen/canvas is 800 x 600 and I’m drawing a mesh forming a rectangle 20 x 20 wide in the corner (0x, 0y) then the pixel shader will only get these 20 x 20 pixels to process?

If this is roughly how it works, is there a way to know in the pixel shader that the pixel it’s currently processing is part of this 20 x 20 piece?

Thanks!

GoldenThumbs · October 28, 2020, 10:44pm

It’s fine lol. Keep in mind I’m no expert either, I just program a lot of shaders. Going to answer your questions in order.

According to the Docs I posted a link to, yeah SV_Position is basically gl_fragcoord. I’d imagine you’d still need the fragment position for this type of shader.
Yes, I’m fairly certain this is how it works
I’m not sure. Maybe via some matrix transformation of some sort?

Why are you trying to do this via shader though? It might be faster if done on the CPU (this is a guess, GPU could easily still be faster) but it would most certainly be easier.

markus · October 28, 2020, 10:51pm

Since a constant shader parameter is not good enough in your case, you have to pass the size information down from the vertex shader. So your vertex shader has to output the rectangle size and your pixel shader needs it as an input.

Unfortunately you can’t calculate the rectangle size in the vertex shader directly, because vertices don’t have access to their neighbours. You have to add the size data to the vertices when you create the vertex buffer in C#, just like you would add texture coordinates, or normals, or whatever. This is a bit unefficient, because you are adding the size to all 4 vertices of the rectangle, but then again, your total vertex count is probably low enough for this to not be an issue.

Alternatively you could add only a rectangle index to every vertex, and then have a global array with all the rectangle sizes.

magneticmuesli · October 28, 2020, 11:13pm

Here’s my rounded rectangle shader as it is working now:

float2 Size;
float Radius;

struct PixelInput
{
    float4 Position : TEXCOORD;
};

//Signed Distance Field function
//Returns how far a pixel is from the edge of my rounded rectangle shape
float roundedRectSDF(float2 centerPosition, float2 size, float radius)
{
    return length(max(abs(centerPosition) - (size / 2) + radius, 0)) - radius;
}

// Pixel shader
float4 PS_RoundedRect(PixelInput input) : COLOR0
{
    //Convert our UV position (that go from 0 - 1) to pixel positions relative to the rectangle
    float2 pixelPos = float2(input.Position.x * Size.x, input.Position.y * Size.y);

    // Calculate distance to edge
    float distance = roundedBoxSDF(pixelPos - (Size / 2.0f), Size, Radius);

    //discard pixels that are outside our rounded rectangle shape
    clip(0.01 - distance);

    //Return the remaining pixels (and give them color)
    return Color;
}

technique RoundedRectangle
{
    pass Pass1
    {
        PixelShader = compile ps_3_0 PS_RoundedRect();
    }
};

magneticmuesli · October 28, 2020, 11:36pm

I’m not sure the absolute screen position is useful in this case, or if it is, how I could use it. I do need the relative coordinates though which I can calculate given the size and UV coordinates… but then the size is what I’m trying to get in the first place… But then this is just the way of doing it that I have managed to hack together, I’m sure there are other ways.

Ok phew…

Hmmm, I thought it must be way faster to do this with the GPU but I’m curious how it could be done efficiently on the CPU? I guess it doesn’t make a big difference in my tiny 20x20 example but I plan to use hundreds of these rectangles in all kinds of big and small sizes… You recon the CPU would be a good alternative? And if so, roughly how?

Thanks!

magneticmuesli · October 29, 2020, 12:00am

Thanks for taking the time to answer @markus !

I’m not sure tbh, it might be totally good enough… this is how I have got it working now and maybe it’s fine, but somehow it doesn’t feel elegant having to set constants and send vertices for one rectangle at a time…

Aha, I see

This actually occurred to me as a solution as well and I spent a while trying this approach but couldn’t make it work. OpenGl got upset as soon as I tried to read from input.TextCoord1, input.Position1 or input.Color1 or whichever label I used to send my size vector.

And then again, like you say, it also doesn’t seem optimal to send this vector to six vertices when it’s only needed once per rectangle…

Aha! Sounds interesting.

Thanks

markus · October 29, 2020, 9:38am

It’s fine if you draw a couple of rounded rectangles every frame.

It’s not fine anymore, and this has nothing to do with elegance. Hundreds of separate draw calls per frame will have a serious impact on performance, the low vertex count doesn’t matter. Or do you not need to render them every frame? (e.g. render to a render target once, then reuse).

markus · October 29, 2020, 9:51am

How dynamic are those rectangles? If you upload the vertex buffer once, and then keep it, this is not an issue. Not for a few hundred rectangles, or even many housands of rectangles.

If you need to reupload the vertex buffer every frame, it’s still fine for a few hundred, and probably a few thousand rectangles.

EDIT:
If you really want to avoid the data duplication you could also use hardware instancing. That way you only need to send the size once for every rectangle.

magneticmuesli · October 29, 2020, 10:34am

Hehe ok got ya!

Hmm moderately. They will be the building blocks of my gui framework and I would love to be able to animate some transitions and so on in the future.

Ooh! This sounds like exactly the right thing!

Thanks!

magneticmuesli · October 31, 2020, 2:04pm

I implemented hardware instancing and there seem to be no real difference in performance drawing a hundred rectangles, but I get roughly four times the fps at 1k rectangles and at 10k rectangles the results are not remotely comparable.

…but this is in an otherwise empty monogame project. I’d imagine that the more game logic that is introduced and the more CPU Bound the game becomes the bigger the performance difference will become even at lower rectangle counts, is this correct?

Thanks

markus · October 31, 2020, 6:29pm

The first thing to check would be if the CPU is already fully loaded at 100 rectangles. If it’s not fully loaded then something else is the limiting factor, which could explain this. If vertical sync is enabled for example the framerate is limited to the refresh rate of your monitor, which could explain why there is no difference in framerate?

You won’t get 100% CPU usage, because your project will mostly use one core/thread. If you have a 6 core/12 thread CPU for example you should see at least 100/12=8,3% CPU load from your project, probably a bit more because some multithreading is already happening even without you coding it.

Bob0one · November 3, 2020, 12:20am

“but somehow it doesn’t feel elegant having to set constants and send vertices for one rectangle at a time…”

You can do this entirely in the GPU. You shouldn’t render per rectangle for an effect like this. That slaughters the parallelism of the CPU and GPU.

So the same way you set up individual corner coordinates for your rectangles, you should be able to set up the size. It’s some duplicate data, but that’s not a big deal. Create another coordinate channel in each vertex, populate it with the size, and pass it into the pixel shader from the vertex shader. Each “coordinate” is the same so interpolation doesn’t matter.

Something like:

Vertex:
x,y,z,w
u,v
r,g,b,a
Size_x,size_y

Alternatively stream another vertex buffer with the 4 sizes that parallels each vertex in your quad stream. This would interleave your vertex buffers:

Q2_tl | q2_size
Q2_tr | q2_size
Q2_br | q2_size
Q2_bl | q2_size

That’s a bit of redundant data (2x3x4=24 bytes per quad) but you get size data throughput the pipeline. You can access it in your shader. That lets you do instancing and have the metadata you need while being friendly to memory locality and parallelism.

willmotil · November 3, 2020, 1:54am

Ya you could do this entirely on the pixel shader i think as well with a straight forward formula.
It would be easier with quads.
You could probably do it with spritebatch but it would be clunky you would have to force the vertex shader to be used and you would need to write one for sb as well.

Basically make a formula that determines the difference of the uv samples from the center uv
diff= texCords - float2( .5,.5 ); // the range is -.5 to 0 to +.5
Then find a curve equation that fits what you want as the absolute of the x or y difference approaches 1 scale the value outwards aka increase it that should be a exponential curve that varys only a little bit till it approaches the extreme or ends +.5 or - then renormalize + .5
finally if the x or y aka u or v is less then 0 or more then 1 just use a transparent color otherwise sample the texture.

that should give a rounded rectangle

If you need finer control and curvature to make each draw controllable uniquely you need to input control control points and use polynominal of one type or the other quadric bezier hermite ect. and feed each vertice the control points for interpolation then pass those thru to the pixel shader
Or just hard code it with like a quadric polynominal degree2 with a control point of like .45 or so and a scalar of like 1.1 to scale it out.
You might be able to do all that on the vertex shader that way but im to groogy right now to figure it out in my head if its possible unless i were to actually try it.

Note this would warp a texture into a rounded image essentially not actually cut out texels only samples outside the texture would be excluded as they should be and uv sample texcoords would be warped.