Image detection in a screenshot picture

Hi to all
I’ve writen a HLSL shader to recognize a fixed image in a screenshot picture.
That works but very slowly. I think my technique is too hard. For each point of my pixel shader I look for the image in the pixels around with “[loop]for x” and “[loop]for y” instructions.
I was expecting to find multiple images but I run under 10 FPS for only 1 image to search !
Is there a better way to do that ?
Following , there is my HLSL code :

float Screen_X;
float Screen_Y;
float Speed;
float Zoom;
float2 TailleObj;

Texture2D Texture : register(t0);
sampler2D TextureSampler : register(s0)
	Texture = <Texture>;
	Filter = POINT;

Texture2D xCapture;
sampler2D TexCapture = sampler_state
	Texture = <xCapture>;
	Filter = POINT;

Texture2D xCompare;
sampler2D TexCompare = sampler_state
	Texture = <xCompare>;
	Filter = POINT;

float4 PS(float4 position : SV_Position, float4 color : COLOR0, float2 TexCoord : TEXCOORD0) : SV_TARGET0
	float Object_X = TailleObj.x;
	float Object_Y = TailleObj.y;

	float Deb_X = TexCoord.x;
	float Deb_Y = TexCoord.y;
	float Fin_X = Deb_X + Object_X / Screen_X;
	float Fin_Y = Deb_Y + Object_Y / Screen_Y;

	float4 C = tex2D(TexCapture, TexCoord);
	float4 C2; 

	int Trouve = 0;
	int Compte = 0;

	[loop]for (float y = Deb_Y; y <= Fin_Y; y += Speed / Screen_Y)
		[loop]for (float x = Deb_X; x <= Fin_X; x += Speed / Screen_X)
			float2 ObjectCoord = float2(Zoom * (x - Deb_X) / (Fin_X - Deb_X), Zoom * (y - Deb_Y) / (Fin_Y - Deb_Y));
			C2 = tex2D(TexCompare, ObjectCoord);
			if (C2.a > 0.0f)

				float4 C1 = tex2D(TexCapture, float2(x, y));
				if ((abs(C1.r - C2.r) <= 0.05f) && (abs(C1.g - C2.g) <= 0.05f) && (abs(C1.b - C2.b) <= 0.05f))

	if (Trouve > 0.2f * Compte)
		C = float4(1, 0, 0, 1);

	return C;

technique FindObject
	pass Pass0
		PixelShader = compile ps_5_0 PS();

Some vars are fixed , zoom , speed , picture size , I hope you will understand
Speed is just the granularity of the search to speed up or not the search.
The image is found if at less 20% of pixels are equal … with a tolerance of 0.05 gradient on R , G and B chanels
Note : I use to look for image between 50x50 pixel and 80x80 pixels

I hope that you will be able to help me.


For something like this i would suggest using something like ML.Net.

Can you insert a break in your loop once you’ve identified just 10 or 50 pixels…? Or maybe just test every n-amount of pixels… Sure you get lower resolution but increased speed. I also imagine the resolution would still be enough. Like fingerprints, you don’t need every atom and fold accounted for, just enough for unique ID…

Like once you have what you need to ID, end the loop instead of completing it.

Hi Mando
Thank you for your suggest
Yes , I have tried to insert a break , it’s ok , but I try to have a maximum % of match , important for the next part , after this search
Thx again

Hello Charles
What is ML.Net ?
I’ve clicked on the link but nothing appears … I go to search on Google.

Sorry, I didn’t put a link, tge editor must have put it in.

Its Machine Learning by Microsoft.

This is a link :slight_smile:

Wooooooo … ok ok … I see now , but do you think my FPS will be good if I ask to ML to tell me what is on the screen ?
Even if I learn it all my images to detect , the answer can be more slow than 10 FPS

Just to know : before to write the HLSL under a GPU shader , I have try my detection using classical programming only with CPU … the elapse time result to search 1 image was around 2 hours of run.

So, you should start by checking every pixel on the search-area if it matches the first (top left) pixel of the object texture, right?

because only if the first pixel matches would you bother testing the rest. This you can do easily at 60fps, at least at 1920 * 1080, judging from my limited experience.

Then if you get a hit, you would THEN - and only then- check the next pixel, and so forth.

unless you want to account for image cropping, rotation, color changes etc…

Yes, you need to improve the algorithm.

The way I would do it would be to break the image you need to detect down into small cells.

Then for each cell I would calculate detection variables.

For example average colour, average luminance, that sort of thing. Try to make it 4 values.

Store the results.

Then write a shader that does the same calculation for the image you are searching. Subtract the stored values from the target, and storing the results in another texture as a single float.

That would give you a sort of probability map of the image you are searching.

Then you can run efficient software filters to find what you need, there are several very good software libraries for that sort of thing rather than re-inventing the wheel

The image is in the middle of the square , rounded by transparent alpha , I can’t stop during the first comparations in alpha mask , and the image can have multiple sizes and can be hidden for a part , it’s the reason why I have to compare all the square.

I can’t compute an presample by average before before the images are very seemless of the screenshot background , more of that , the average will shows some defects if the image to find is between a square average and an other.

Partially hidden? ok. I think that means you cannot find a match, more like “probability of match”…

You are getting into AI territory FAST. -What is this for? Maybe there is a work-around :slight_smile:

Imagine a HTML5 island , false 3D , seems like a 2D without perspective , with some bits of landscape , creating a full landscape with trees , rocks and grass , and in this landscape , some objects are hidden , those objects must be found to complete a quest , this is the goal of my shader , find the objects to click on them and finish the quest … the problem is the landscape can have different size ( zoom ) and objects can be covered by trees or rocks ( % detected )

Then you need to move into image recognition code, but to me this is starting to sound like a game bot used for cheating.

What are you using this for?

Hi Stainless
it’s just to help to find objects on the map , it’s not for cheating , I just hope that the program will be faster and better than the eyes.
Beyond the goal of recognizing objects, I find the development of this tool really interesting … then I wish to achieve it

Why do you need this?

Have you considered turning on mipmaping then trying to run your algorithm on a smaller mip level.
that should offload a lot of the work to the gpu and reduce the sampling size so that means much less work.

I imagine there should be a sweet spot were your routine will still work and the image doesn’t get to distorted to were you get false positive results.

If you really think about it if you were to mip map down your original algorithm or image to compare to another to test against then the values that were blended down in the original would match the mip down tested image values if present. So that you would be reducing the entire size of the computation with some small trade off in error increase.

I don’t need this , it’s so boring to look for objects on the map , that can represent half hour sometimes , it’s only to reduce the wasted time on this task

Willmotil , thank you for your suggestion , but , as you said , the mipmap will make too many false positive found … the tolerance is actually fixed to 0.05 because 0.2 makes me too many similars , and 0.1 was not enough thin …
But thank you for your suggestion

For example , here is some best good results in pictures search

Here are 4 casual objects found without difficulty :

A scarecrow behind a dead tree :

A leather piece behind few leafs and branches tree