Advertisement

My fluid-sprite collision detection is slow because of glReadPixels

Started by April 05, 2025 12:25 AM
7 comments, last by taby 5 days, 16 hours ago

In my latest game prototype, I am using the following process per frame:

  1. generateFluidStampCollisionsDamage()

    -render collision data using the detectCollisionProgram

    -read the contents of the texture on the CPU using glReadPixels

    -for each collision point in that texture that has non-zero alpha, see which stamp(s) are at that location

  2. processCollectedBlackeningPoints

    - blackens things

As you can see, step 1b calls glReadPixels, which is ridiculously slow.

Do you have any ideas on how to redo this process so that it runs entirely on the GPU? I was thinking of somehow using a texture that contains the id of the stamp at location x,y. However, this only allows collision to damage to one stamp per collision pixel. That's not really optimal.

The source code is not the best to look at.

taby said:
I am using the following process per frame:

The following description does not help me at all to understand what you're doing.

taby said:
I was thinking of somehow using a texture that contains the id of the stamp at location x,y. However, this only allows collision to damage to one stamp per collision pixel.

Maybe, instead using images, you could use a list of stamps? And maybe you can use mapped memory for related SSBOs.

This way you can minimize the size of the data to transfer, and you can avoid the conversion of GPU internal texture formats.

Also: If you want to download stuff from GPU, it helps to accept some frames of latency. E.g. if your swapchain has 3 images, you likely end up with a latency of 3 frames, requiring to keep 3 buffers around.
This means the data you can work with is outdated a bit, but you can avoid stalls from waiting on the transfer to be completed.
(I've never done this and can't help with details)

Advertisement

No clue at all about feasibility, but is it an option to delay display by one frame?

In that way, you can maybe do the second part at the GPU and then display it.

Yes. Some of the AIs recommended that I use a pixel buffer object, which does not block when used, unlike glReadPixels.

I have yet to integrate that into my code. Is this also what you recommend?

Well, I tried the PBO method, and there's no speed up.

taby said:
Is this also what you recommend?

Reading a bit about PBOs, yes you should probably use that to implement the transfer.
But you may still want double / triple buffers on top of that, to ensure the data is ready and you never need to wait for it.

That's at least what i think gives you the best performance (trading it for latency).

However, if Switch is your only target platform, that's shared memory then, so you do not need a transfer at all to read it from CPU. You only need to synchronize access (and care about HW tiling format / conversation with low level APIs).

Advertisement

Hmm. So, does Switch/Switch2 support OpenGL?

Sorry, I should have Googled this before asking you. I see that Switch supports OpenGL. This is great. My game uses OpenGL 3.3 – I'm very interested in seeing how fast glReadPIxels runs on the Switch / Switch 2.

I signed up for a Nintendo Developer account.

Advertisement