I have a pointer to an array of bytes that is populated by a c++ application (Snes9X). Basically what i need is to read that array and render to the screen using Monogame. That array points to the Snes9x’s backbuffer and its (at least) 256 * 224 * 2 bytes big (width * height * bytes per pixel).
My first idea was convert that array into a Texture2D and draw to the screen, but i think this would be very slow since i’d have to update the texture 60 times per second. Plus, i’d also have to convert the 16bpp pixel format to RGB before updating the texture.
The problem is that modern GPU architecture is very different from the old ways, like NES where the hardware had direct access to each pixel, in modern GPUs textures are stored in the graphic card memory and handled by the GPU by giving small number of instructions from the CPU so there is no need to tell each pixel what it has to display. So if you use modern GPUs you will have to use very slow processes like setting each pixel per frame, and I believe this is not monogame specific, it is architecture of modern graphics.
You can pass insane amount of data per frame between CPU and GPU, that bus is insanely fast thesedays. We have entire techs based on it, for example Virtual Texturing. It is not very slow process, it is extremely fast process, not as fast as GPU → GPU fast (hence it is considered bottleneck in some cases) but still very fast with very, very high bandwidth. Few thousands bytes per frame is absolutely, absolutely nothing, we can do way more. We also have “os level” features that are built for high CPU to GPU traffic, check Direct Storage from microsoft, not that it is applicable at all in this case, I am just pointing out why thinking that modern architecture arent build for CPU->GPU high bandwidth traffic is wrong statement.
Additionally for future optimization we can create (in dx, tho openGL has own variant) buffers with Dynamic usage flag, we will sacrifice GPU read speed for some CPU write speed, useful for buffer we need write into MULTIPLE times PER update. For example it is common for constant buffers.
CPU utilization is not a good indicator … if you turn off FixedTimeStep, your utilization will be (almost) 100% (minus IO wait etc), no matter what u do.
So with fixed 60 FPS you have 16 ms per frame and you can basically utilize it as it’s needed for your game. Copying an flat array into an array of double the size isn’t really that expensive, so maybe you have bottlenecks somewhere?
Unfortunately I have no idea how the 2bytes of snes are encoded to feature 3 values, but there is a SurfaceFormat for single 16 bit and there is one with 2x8bit which should at least spare you a conversion on CPU side and you can do it on the GPU instead
edit: oh someone already found a good SurfaceFormat, so never mind