[SOLVED] Color histogram of texture via shader

mgulde · January 26, 2017, 7:46am

Dear all,

I am currently trying to perform some basic thermal analysis on a satellite. Basically, I need to calculate surface areas (view factors) as seen from the sun and earth. A common way to do so is via an orthographic projection of the surface of interest as seen from, i.e., the sun.

In this plot, you can, for example, see the solar cells (light steel blue) as visible from the sun.

Knowing the size of a single pixel in this representation, I only need to count the pixel number to get the surface area. So far, I did this by using a RenderTarget2D and a Color[] Array which I filled via GetData<Color>() and cycled through. Works well, but is super slow.

However, I believe that this task can also be solved on the GPU. I have read about atomic counters employed to get an image histogram, which is exactly what I need. In this tutorial, an atomic counter (or InterlockedAdd) is used to correct an image for luminosity.

However, I have failed to make it work so far. I have checked the HLSL reference and seen that the function I probably need is only available for shader model 5 and newer (https://msdn.microsoft.com/en-us/library/windows/desktop/ff471406(v=vs.85).aspx) and have therefore changed the profile accordingly. Unfortunately, this is currently not possible with OpenGL 3.0 (needs at least 4.2, I believe), so I am using DX.

Since I am very new to shaders, my approach is probably very naive (Main routine of pixel shader, using the standard notation as predefined in the monogame SpriteEffect template):

groupshared uint histogram[32];

float4 MainPS(VertexShaderOutput input) : COLOR
{
	// get bin to write in
	int bin = input.Color.r * 32;
	// atomic add
	InterlockedAdd(histogram[bin], 1);

	// for now, display image
	return tex2D(SpriteTextureSampler,input.TextureCoordinates) * input.Color;
}

I get two error messages during compilation:
a) groupshared is not supported in ps_5_0 -> which shader level should I use?
b) Resource being indexed cannot come from conditional expressions … -> probably using histogram the wrong way?

Does anybody have experience in how to use this command properly? Or am I missing the point entirely and should use another way to solve this problem?

Thanks a lot in advance

kosmonautgames · January 26, 2017, 7:49am

this doesn’t work in monogame, sorry. Compute shaders are not supported. You will have to look for a different framework

mgulde · January 26, 2017, 7:50am

This was a quick (and unfortunate) reply. Thanks a lot though!

kosmonautgames · January 26, 2017, 8:21am

maybe you can do it with a pixel shader, given you only want to look for a maximum of 4 colors (not a histogram, but good enough for your purpose).

You can have a 1x1 rendertarget (Format.Single to Format.Vector4) and for that pixel you can sample all other pixels, and count how many of them have the looked for color. Divide your result by amount of total pixels.

To reconstruct the int value you can read that pixel in your C# code and multiply by amount of total pixels again.

I guess there might be an issue with the shaders if you sample thousands of texels, that might not be supported. But in theory it’s no different than reading one sample for each pixel.

Jjagg · January 26, 2017, 11:44am

Will it still get parallellized in this case?

kosmonautgames · January 26, 2017, 11:49am

no

actually for 1 pixel probably 4 pixels would be calculated anyways, maybe 16 even depending on gpu.

Maybe some form of downsampling with min/maxing or something can be used

Jjagg · January 26, 2017, 11:59am

We really need to get compute implemented

mgulde · January 26, 2017, 3:14pm

If I understand it correctly, the calculation would be performed in series on the GPU instead of on the CPU. I believe in this case the CPU would do the faster job, don’t you think?

mgulde · January 26, 2017, 3:15pm

I just checked - the InterlockedAdd is in principal supported in Pixel Shaders 5.0 and above:

scoy · February 2, 2017, 9:44pm

Another approach would be to reduce the size of the image you are grabbing via GetData() so that it’s no longer so slow. A multi-pass solution should get you quite close.

For the first pass create a RenderTarget with dimensions 1/4 the size of the original image and SurfaceFormat Single. Have your shader sample a 4x4 block (16 samples) of original pixels and output a count of the number of matching pixels. For instance, if your original image is 1024x1024, you should then end up with a 256x256 image where each pixel has a 0…16 value.

For the second through nth passes do the same kind of thing. You will need a different shader since your input will be a count of the previous level instead of a color. Just sum the 16 sampled values and output that value.

Repeat this until you either have a 1x1 image or an image with a dimension that is no longer divisible by 4. At that point, use GetData() to get the results and sum them to get your answer. Since the image you are calling GetDtata() on will be small it should be quite quick.

Even with multiple passes this should easily run at real-time rates.

mgulde · February 7, 2017, 7:33am

Hmm, sounds interesting and promising, but I am not sure if I understand it correctly:

First, I create the original image and pass it to the (pixel) shader. The shader samples a block of 4x4 pixels and writes the number of matching pixels in a new rendertarget of 1/4 the height and 1/4 the width of the original image.

First question here: How do I know which pixel of the rendertarget is worked on by the shader? I need to somehow decide which pixels of the original, large image I need to work on.

Repeat the process with a further reduced image, e.g. 1/16 in height and width of the original one, using the previous rendertarget as input. Find the point where it’s faster to use GetData() instead of going through the shader routine again.

Since I have more than one color I need to count at a time, I could either use a different surface format and write into the four channels (rgba) or use offsets, e.g., color 1 is counted as “1”, while color 2 is counted as “16”. Or maybe there’s a smarter way.

Update: Do I just sample around my current texture coordinate knowing the resolution of the input image?

nkast · February 7, 2017, 1:42pm

OcclusionQuery might work for you.

First draw you satelite to fill the depth buffer. Then redraw each surface and count the number of drawn pixels with a query.

mgulde · February 8, 2017, 2:32pm

I was not too successful getting the shader version to run smoothly, but OcclusionQuery seems to work fine. I have to check now how performance behaves compared to the CPU-based histogram in case of many objects. Will let you know once I got something.

Update: What a speedup! I compared an Intel i7-4770K at 3.5 GHz with a Nvidia GT 640. The image has 850 px squared and 13 different colors which areas need to be determined.

Without computing the histogram: 1395 fps.
CPU computes histogram: 48 fps.
GPU computes histogram: 629 fps.

scoy · February 8, 2017, 7:30pm

Good call by nkast. I had completely forgotten about OcclusionQuery.

To answer your question about the shader… When you do this kind of filtering you generally “render” two triangles to cover the whole RenderTarget. The vertices need to include UV coordinates which you can then manipulate to generate the samples you need. Typically I will render the triangles with UV coordinates going from 0…1. I will then also pass in a float2 parameter to the shader which contains the 1.0 over the source texture resolution. This value is then the offset you can use in the sampler to adjust the sample position by a single texel.

float2 TexelSize; // 1 / texture resolution

center = tex2D( Sampler, uv );
texelToRight = tex2D( Sampler, uv + TexelSize );
texelToLeft = tex2D (Sampler, uv - TexelSize );
etc.

As far as dealing with multiple colors, you could do up to 4 at a time like this using the Vector4 SurfaceFormat for your rendertargets.

scoy

mgulde · February 9, 2017, 7:02am

Thanks @scoy, I will try to implement it for comparison. For now, however, I will stick to the occlusion query since here I am free in terms of resolution.