Storing OpenCL-generated data directly into a Texture2D

(this question was actually also asked on Stack Overflow, but without answer: C# Rendering OpenCL-generated image)

Using a wrapper called Cloo, I discovered OpenCL a while ago and immediately became a huge fan. With OpenCL, I was able to speed up some calculations from a minute to less than a second. For example, the first thing I made was a program to render a dynamic Julia fractal (with dynamic I mean de fractal is changing every frame by changing the base point in the complex plane). The only thing you need to know about a Julia fractal is that every pixel can be calculated independently, so the task is easy parallelizable. I actually got it working by calculating the pixeldata using OpenCL, sending the data to the CPU and storing the data in a Texture2D, thus sending it back to the GPU. The problem is that I only get about 10 frames a second. I realised that the constant communication between the GPU and CPU is the real bottleneck.

My question: is it possible to directly store data calculated by OpenCL in a Texture2D without involving the CPU?

By the way, I also tried using a shader for this, but HLSL doesn’t allow enough arithmetic slots. GLSL does allow this, so if there is an option to use GLSL within Monogame, that would also solve my problem.

Some quick research showed it should be possible to map a CL output buffer to a GL texture. This presentation (page 36) shows some code to do it.

The Texture2D class does not expose the GL texture handle, so you would need to build MonoGame from source to get access to it, or use reflection on the Texture2D class to find the glTexture field.

If I understand correctly, there currently isn’t a build-in way to do this. Will this be a feature in the future?

You have to get the native handle/resource ID to use the OpenCL sharing API, which means reflection to get to the relevant object (https://github.com/MonoGame/MonoGame/issues/5552).

If you haven’t used sharing before, be sure to enumerate the supported extensions first as sharing support is unreliable IME. Don’t forget the lock/unlock stage either or changes won’t propagate.

HOST_PTR storage is tangentially related, avoid it for textures unless you do everything in OpenCL as Allegorithmic has the patents on shared storage mixing of GPU and CPU operations for the purposes of textures (applies to mobiles and integrated GPUs where the CPU/GPU use the same set of memory).