Fastest way to draw an array of bytes

joao_silva · December 7, 2022, 7:19pm

I have a pointer to an array of bytes that is populated by a c++ application (Snes9X). Basically what i need is to read that array and render to the screen using Monogame. That array points to the Snes9x’s backbuffer and its (at least) 256 * 224 * 2 bytes big (width * height * bytes per pixel).

My first idea was convert that array into a Texture2D and draw to the screen, but i think this would be very slow since i’d have to update the texture 60 times per second. Plus, i’d also have to convert the 16bpp pixel format to RGB before updating the texture.

Is there a better way to achieve this?

reiti.net · December 7, 2022, 7:30pm

As long as the image you wanna draw does not originate from the GPU, you’d have to send it 60 times per second no matter what.

A Texture2D is not limited to 16bpp, check the constructor, there is one which takes a SurfaceMode (I think it was called surfacemode) but I cant say if there is any which would just fit you

But tbh - 256x224 shouldn’t be that big of a deal, especially as you’re dealing with flat arrays anyway - I would just give it a try to do it regulary by updating the texture every frame.

MysticRiverGames · December 8, 2022, 4:27am

The problem is that modern GPU architecture is very different from the old ways, like NES where the hardware had direct access to each pixel, in modern GPUs textures are stored in the graphic card memory and handled by the GPU by giving small number of instructions from the CPU so there is no need to tell each pixel what it has to display. So if you use modern GPUs you will have to use very slow processes like setting each pixel per frame, and I believe this is not monogame specific, it is architecture of modern graphics.

joao_silva · December 8, 2022, 4:37am

I tried and its not good, just converting the array to a Texture2D and rendering it is taking 15% of my CPU (core i5).

Converting from 16bpp to RGB inside the shader might help. I’m also considering using a VBO or some other way to send data to the GPU.

Ravendarke · December 8, 2022, 12:29pm

You can pass insane amount of data per frame between CPU and GPU, that bus is insanely fast thesedays. We have entire techs based on it, for example Virtual Texturing. It is not very slow process, it is extremely fast process, not as fast as GPU → GPU fast (hence it is considered bottleneck in some cases) but still very fast with very, very high bandwidth. Few thousands bytes per frame is absolutely, absolutely nothing, we can do way more. We also have “os level” features that are built for high CPU to GPU traffic, check Direct Storage from microsoft, not that it is applicable at all in this case, I am just pointing out why thinking that modern architecture arent build for CPU->GPU high bandwidth traffic is wrong statement.

Additionally for future optimization we can create (in dx, tho openGL has own variant) buffers with Dynamic usage flag, we will sacrifice GPU read speed for some CPU write speed, useful for buffer we need write into MULTIPLE times PER update. For example it is common for constant buffers.

stromkos · December 8, 2022, 3:42pm

Use the Surface format Bgr565 when creating the texture.

public Texture2D(GraphicsDevice graphicsDevice, int width, int height, bool mipmap, SurfaceFormat format)`
// then call SetData():
texture.SetData<ushort>(data); // where data is the array.

No conversion would be required.

See the following link for changing the format if needed:

github.com

snes9xgit/snes9x/blob/3c4982edddfdba482204ed48cf0b1d41ccae5493/docs/porting.html#L81

      
        
            			RIGHTSHIFT_IS_SAR<br>
            		</code></p>
            		<h2>Editing port.h</h2>
            		<p>
            			You may need to edit <code>port.h</code> to fit Snes9x to your system.
            		</p>
            		<p>
            			If the byte ordering of your system is least significant byte first, make sure <code>LSB_FIRST</code> is defined, otherwise make sure it's not defined.
            		</p>
            		<p>
            			You'll need to make sure what pixel format your system uses for 16-bit colors (<code>RGB565</code>, <code>RGB555</code>, <code>BGR565</code> or <code>BGR555</code>), and if it's not <code>RGB565</code>, define <code>PIXEL_FORMAT</code> to it so that Snes9x will use it to render the SNES screen. For example, Windows uses <code>RGB565</code>, Mac OS X uses <code>RGB555</code>. If your system supports more than one pixel format, you can define <code>GFX_MULTI_FORMAT</code> and change Snes9x's pixel format dynamically by calling <code>S9xSetRenderPixelFormat</code> function. If your system is 24 or 32-bit only, then don't define anything; instead write a conversion routine that will take a complete rendered 16-bit SNES screen in <code>RGB565</code> format and convert to the format required to be displayed on your system.
            		</p>
            		<p>
            			<code>port.h</code> also typedefs some types; <code>uint8</code> for an unsigned 8-bit quantity, <code>uint16</code> for an unsigned 16-bit quantity, <code>uint32</code> for a 32-bit unsigned quantity and <code>bool8</code> for a <code>true</code>/<code>false</code> type. Signed versions are also typedef'ed.
            		</p>
            		<h2>Controllers Management</h2>
            		<p>
            			Read <code>controls.h</code>, <code>crosshair.h</code>, <code>controls.txt</code> and <code>control-inputs.txt</code> for details. This section is the minimal explanation to get the SNES controls workable.
            		</p>
            		<p>
            			The real SNES allows several different types of devices to be plugged into the game controller ports. The devices Snes9x emulates are a joypad, multi-player adaptor known as the Multi Player 5 or Multi Tap (allowing a further 4 joypads to be plugged in), a 2-button mouse, a light gun known as the Super Scope, and a light gun known as the Justifier.

reiti.net · December 8, 2022, 5:26pm

CPU utilization is not a good indicator … if you turn off FixedTimeStep, your utilization will be (almost) 100% (minus IO wait etc), no matter what u do.

So with fixed 60 FPS you have 16 ms per frame and you can basically utilize it as it’s needed for your game. Copying an flat array into an array of double the size isn’t really that expensive, so maybe you have bottlenecks somewhere?

Unfortunately I have no idea how the 2bytes of snes are encoded to feature 3 values, but there is a SurfaceFormat for single 16 bit and there is one with 2x8bit which should at least spare you a conversion on CPU side and you can do it on the GPU instead

edit: oh someone already found a good SurfaceFormat, so never mind

stromkos · December 8, 2022, 7:02pm

C# managed array access is quite slow due to index validation and dereferencing for each access.

For faster operations pin both arrays in memory using an unsafe context and use pointer arithmetic to read, modify, and write the memory contents.

You could also utilize the multithreading Parallel.For() to utilize multiple cores. This is thread safe for same index reads and writes of integral types.

joao_silva · December 14, 2022, 9:30pm

Thanks, that worked nicely and it’s not consuming much CPU.