Fantasy Console: Simulate low-res display with byte-array

Hello :slight_smile:

I am currently trying to create a small prototype-system with realish memory.
Currently I am working on displaying the bytes set in the virtual system memory and it works fine… I guess but I am not very proud of the solution and to be honest, I feel like I should switch to a faster language and another Framework, like sfml or sdl.

–Edit-- because I dont want to stick with 128x128 resolution, I wanted to go to 256x192.
–Edit-- I also tried shaders.

How it works:

I have a byte array that fits 8192bytes.
0x0000 - 0x03FF is system data, where I can define like the mouse cursor sprite and also input states, etc.
0x0400 - 0x13FF is display data, this is the actual contant ofthe 128x128 screen. I use 2 bits per pixel.

I have done research on xna and monogame and ways to optimize the process of updating the actual data of the Texture2D.

At first I was just using SetData on a 128x128 Texture2D, in the Update and simply rendered it with the SpriteBatch.

But I came across some posts saying it would be better to use a RendereTarget2D and draw the Texture2D to the RenderTarget2D and the draw the RenderTarget2D with the SpriteBatch.
IDK why but it reduced my update time.

Next thing I came across was to call the SetData in a Task or a Thread. And my own idea was to only update the Texture2D if the Memory actually changed.

With all that in place I still feel very unsatisfied about the performance. It is not that baad ig but I feel like I am missing something and I also want to imporve the memory usage and I guess for that to work I have to implement my own Texture2D class which seems kinda overkill to me tho.

The reason I started this project is because I wanted to learn about Span<>, lol

Code can be founs here: GitHub - SameplayerDE/Lega
The project I am talking about is

Lega.Monogame.Shared/OpenGL.Hoopfe

How bad is it? I can’t imagine a SetData on a 128x28 texture to be a big problem.

The most obvious optimization would be to reduce the number of bits you upload.
If every pixel is 2 bit only, but you are updating a texture where every pixel is 32 bit (as is the case with standard color textures) then there is a big waste here.

You can change to a single channel texture format. I think you can’t go lower than 8 bit per pixel, so that would still be wasteful. You should pack the pixels more tightly: four 2-bit pixels fit into one 8-bit channel. You then unpack the pixels in a custom shader.

1 Like

At quick glance it seems he is using task to setData to texture literally PIXEL BY PIXEL. This needs to be rebuild from scratch. As Markus said, 128x128 texture update is trivial, cheap operation, we can push way, way more data from CPU → GPU each frame even on nowdays ancient machines. 16384 subresource updates/buffer mappings is not.

In given context I would greatly advice against writing custom texture class and revisit usage of existing functionality. Also there is no point in trying to multithread this part of the process.

1 Like

This is how how currently handly converting 2 bits into an array of colors.
But I think Markus hat a very gut idea.

I wll try to look into the shader thingy, tho I dont really know how to convert my costum 2 bit pixels to 8 bit pixels in a shader.

What do I actually pass to the shader? Like a custome IVertexType, because I cant just pass the array. This would throw an shader build error, ig

private void OnMemoryChange(object sender, EventArgs eventArgs)
{
Task.Run(() =>
    {
        _output.SetData(Util.FromBuffer(VirtualSystem.Instance.Peek(0x400, 4_096)));
    });
}
        public static Color[] FromBuffer(ReadOnlySpan<byte> data, int pal = 0)
        {
            var result = new Color[data.Length * 4];
            for (var i = 0; i < result.Length; i += 4)
            {
                var @byte = data[i / 4];
                int upper = @byte >> 4;
                int lower = @byte & 0x0F;

                int a = upper >> 2;
                int b = upper & 0x03;
                int c = lower >> 2;
                int d = lower & 0x03;

                result[i + 0] = GetColor(pal, a);
                result[i + 1] = GetColor(pal, b);
                result[i + 2] = GetColor(pal, c);
                result[i + 3] = GetColor(pal, d);
            }
            return result;
        }

Drawsprite(), SetPixel(), Poke() would set pixel by pixel. Bit hard to navigate that repo as there are multiple versions of everything. At this amount of data it really aint throughput bottleneck.

1 Like

This is to set the memory of the virtual system. I am simulating the system memory with this.

I am just saying, make sure you are not using this from your repo:

        {
            
            if (x < 0 || y < 0 || x >= 128 || y >= 128)
            {
                return;
            }

            //maps x and y to the address
            var index = y * 32 + (x / 4);
            //0 if it is a most sig byte 1 if it is least sig byte
            var region = x % 4;
            //color id 
            var color = (byte)(id % 4);
            //what is currently saved at this address
            var takenBy = _systemDisplayData.Peek(index);

            byte aa = (byte)(takenBy >> 6);
            byte ab = (byte)(takenBy >> 4 & 0x03);
            byte ba = (byte)(takenBy >> 2 & 0x03);
            byte bb = (byte)(takenBy >> 0 & 0x03);

            if (color == 0 && transparent)
            {
                return;
            }

            switch (region)
            {
                case 0:
                    aa = color;
                    break;
                case 1:
                    ab = color;
                    break;
                case 2:
                    ba = color;
                    break;
                case 3:
                    bb = color;
                    break;
            }

            aa = (byte)(aa << 6);
            ab = (byte)(ab << 4);
            ba = (byte)(ba << 2);

            Poke(_systemDisplayData.Offset + index, (byte)(aa + ab + ba + bb));
        }

        public void DrawSprite(int x, int y, int id)
        {
            var data = Peek(16 * id, 16);
            for (var col = 0; col < data.Length / 2; col++)
            {
                for (var row = 0; row < data.Length / 8; row++)
                {
                    var index = col * (2) + row;
                    var @byte = data[index];

                    byte aa = (byte)(@byte >> 6);
                    byte ab = (byte)(@byte >> 4 & 0x03);
                    byte ba = (byte)(@byte >> 2 & 0x03);
                    byte bb = (byte)(@byte >> 0 & 0x03);
                    var dstX = x + (row * 4);
                    var dstY = y + (col);
                    SetPixel(dstX + 0, dstY, aa, true);
                    SetPixel(dstX + 1, dstY, ab, true);
                    SetPixel(dstX + 2, dstY, ba, true);
                    SetPixel(dstX + 3, dstY, bb, true);
                }
            }
        }

As this ultimately ends up calling function that calls Setdata actually 4times fo each pixel. I didn’t verified if this code is actually used, I’ve just noticed it is in there.

2 Likes

oh lol, I see. okay, imma fix this real quick. You mean because poke is called for each setPixel and therefore the event is invokes 4 times… I see. okay wait

Okay, I fixed the issue with the 4 times calling. I left the Task calling the setData in and checked the memory and performance with dot Memory and dot Trace.





These are the result from dot Trace and dot Memory without the Task calling SetData



One Update call takes aroung 0.7 milliseconds without calling SetData in a Task.
grafik

With the task it takes aroung 0.03 - 0.02 ms
grafik

But I still feel like I can somehow improve the performance :confused:

Wait, did you optimize it from 4 SetData calls per pixel to 1 SetData call per pixel?
You have to optimize it to 1 SetData call per frame, not per pixel.
By the way, 0.03 ms is equivalent to 33 thousand fps. I would call that a decent frame rate :grinning:

1 Like

When memory changed I invoked a MemoryChange event.
I was not planning on using this event to call SetData tbh, but I think sleepy me just did that.

public void DrawSprite(int x, int y, int id)
        {
            var data = Peek(16 * id, 16);
            for (var col = 0; col < data.Length / 2; col++)
            {
                for (var row = 0; row < data.Length / 8; row++)
                {
                    var index = col * (2) + row;
                    var @byte = data[index];

                    byte aa = (byte)(@byte >> 6);
                    byte ab = (byte)(@byte >> 4 & 0x03);
                    byte ba = (byte)(@byte >> 2 & 0x03);
                    byte bb = (byte)(@byte >> 0 & 0x03);
                    var dstX = x + (row * 4);
                    var dstY = y + (col);
                    SetPixel(dstX + 0, dstY, aa, true);
                    SetPixel(dstX + 1, dstY, ab, true);
                    SetPixel(dstX + 2, dstY, ba, true);
                    SetPixel(dstX + 3, dstY, bb, true);
                }
            }
        }

SetPixel Invokes the event once… so There was a big problemo, xD.

But… Funny… It did not change the update time. Maybe because I was calling setData in a task.

Now I am using the original plan. When memory in the displaymemory range is changing I queue a displayCommand struct.

When I call VirtualSystem.Instance.Apply().
I check if there is something in the queue and then invoke a displaymemory change event which then calls setdata.

Yeah it runs smooth but I still feel like, I am doing something wrong, idk. Maybe I am just overthinking it.

So are you still calling SetData multiple times per frame? You should SetData only once.

If you spawn tasks from your update function, then those tasks won’t be processed as part of your update function, so they don’t add to the update function timing.

Nein, currently I am calling it once per frame.

public void Apply()
        {
            var queued = _commands.Count > 0;
            while (_commands.Count > 0)
            {
                var command = _commands.Dequeue();
                //Future Stuff Maybe IDK
            }
            if (queued)
            {
                OnDisplayMemoryChange(EventArgs.Empty);
            }
        }

Your profiling data shows that every frame 9.9 ms are spent in PlatformSetDataBody().
That’s a lot, are you sure SetData is called once per frame? Can you verify that by setting a breakpoint in your update function and a breakpoint on SetData?

1 Like

Yes, it is called once per frame. I just checked.

Why would 9.9 ms be spent on a single SetData call? That doesn’t make any sense. Perhaps you are calling SetData from multiple places in your code?

I am just using it once in Game1.
I will look into it and try to find the reason for this.

The CPU cant be the problem because it is not that bad.

But I also think that 9.9 ms is too much for a SetData.

_output.SetData(Util.FromBuffer(VirtualSystem.Instance.Peek(0x400, 4_096)));

// Virtual system
public ReadOnlySpan<byte> Peek(int address, int bytes)
		{
			return _systemMemory.Peek(address, bytes);
		}

// VirtualMemory
public ReadOnlySpan<byte> Peek(int address, int bytes)
        {
            if (!Contains(address, bytes))
            {
                throw new Exception();
            }
            return (ReadOnlySpan<byte>)_data.AsSpan().Slice(address, bytes);
        }

//Util
        public static Color[] FromBuffer(ReadOnlySpan<byte> data, int pal = 0)
        {
            var result = new Color[data.Length * 4];
            for (var i = 0; i < result.Length; i += 4)
            {
                var @byte = data[i / 4];
                int upper = @byte >> 4;
                int lower = @byte & 0x0F;

                int a = upper >> 2;
                int b = upper & 0x03;
                int c = lower >> 2;
                int d = lower & 0x03;

                result[i + 0] = GetColor(pal, a);
                result[i + 1] = GetColor(pal, b);
                result[i + 2] = GetColor(pal, c);
                result[i + 3] = GetColor(pal, d);
            }
            return result;
        }

        public static Color GetColor(int pal, int a)
        {
            return Palettes[pal % Palettes.GetLength(0), a % 4];
        }

SetData will send all Data over the PCI Bus to the GPU - this can potentially force a lot of things into wait state as a consequence.

If you render to a texture and use that texture instead - the texture is already on the GPU, therefore you see a time benefit, as there is no transfer

1 Like

Are you calling SetData from a thread?
That could add additional allocations and a 1 frame latency
between calling SetData and the time the command is send to the GPU.

See in you trace log that SetData is spending 4.4ms in BlockOnUIThread.
Essentially doing nothing but waiting for the main thread to finish the current
game loop (Update/Draw/Present).
Then at the end of the frame it takes additional 2.1ms in StateActionHelper.
That’s when it finally sets the data on the texture.

You best course of action is to switch to the DirectX platform.
or you could do you own sync and call SetData from the main thread.