Cube Mipmap Generation is Ridiculously Slow

Started by
13 comments, last by JoeJ 6 months, 3 weeks ago

I'm adding dynamic cubemap reflections and image-based ambient lighting to my engine (using OpenGL) and have encountered a problem that I'm not sure how to handle. I can render the scene 6 times to a cube map successfully, and use the cube map for reflections.

However, when I try to generate mipmaps for the reflection cube map using glGenerateMipmap(), it is unbelievably slow (like 30-100ms for a tiny 8 pixel cube map). In the other places I am calling glGenerateMipmap() every frame, such as on a copy of the main framebuffer, it is about 1000 times faster. Surely I am doing something wrong to get such bad performance.

My rendering pipeline is like this:

  • I create an FBO with a color cube map and depth cube map bound. This is done once on the first frame, and the FBO+maps are reused each frame.
  • When rendering reflections, I first bind the FBO, then render the scene 6 times, each time I calll glFramebufferTexture2D() to switch to the right cube map face. Afterwards all faces are done, I unbind the FBO.
  • Then I generate mipmaps for the new cube map contents using glGenerateMipmap().
  • For testing purposes I am not even using the cube map for any further rendering.

These are the things I have tried to resolve this:

  • I tried unbinding the texture from the FBO before calling glGenerateMipmap(), with no effect.
  • I tried creating multiple FBO+map assemblies and rotating between them on each frame, with no effect.
  • If I never bind the FBO for rendering, then glGenerateMipmap() is fast (though it has garbage image).
  • If I bind the FBO for rendering but don't draw anything, mip generation is fast. Even adding glClear() makes it slow again.
  • Calling glGenerateMipmap() again after the first time is fast.

My feeling is that this is somehow related to the texture being bound to an FBO at the same time, though there are lots of other places (e.g. shadow maps, water refraction pass) where I use a texture immediately after rendering to it in an FBO, without any problems. For instance, for refraction, I blit the main framebuffer to another texture (also bound to an FBO), and can generate mipmaps very quickly (30us). The only thing particular to this situation is that the texture is a cube map, and that I'm rendering into it rather than calling glBlitFramebuffer().

One potential workaround is to render into a 2D texture instead, then glBlitFramebuffer() the 2D texture into the cube map faces after rendering. I haven't tried this and would like to avoid it.

Does anyone have any ideas or pointers for how to resolve this performance issue?

Advertisement

I see a lot of problems here: you don't indicate what engine it is so this is your own OpenGL engine?! because “my engine” on Unity and “my own engine” are actually different

___________

and further in order of coding understanding: you don't think in terms of game coding you're actually a non-game programmer hence all the problems

___________

first, you must use a procedural approach to programming and all game programmers and even advanced game users measure performance in FPS, not in micro-seconds or anything else that is, the measurement is in the number of frames per second

___________

First of all, in the gaming sense, you are not engaged in rendering the map or rendering a cube or some other object,

and you’re doing total rendering of a scene where there are already prepared objects

prepared objects for rendering: inserted tiles into cubes / texturing

___________

essentially the confusion in the approach to programming leads to that you yourself don’t understand for what purpose it all works and in the end you can’t understand where your mistakes are

___________

and how programming begins with the indicator interface which control these very processes in numerical equivalent at the moment of program execution, often without third-party meters that is, an inside structure of self-control of the run-time program is created that is, the one that is already running / is in the process of execution

___________

a custom engine always has an "engine name" or there is a "working title for the engine"

Alice Corp Ltd

That's a whole lot of jibberish that completely ignores the question of the OP, with some bad advice thrown in:

alice wolfraider said:
all game programmers and even advanced game users measure performance in FPS, not in micro-seconds or anything else that is

It's much better to measure performance in time units (s, ms, us) than in inverse time units (FPS) because there is a linear relationship, making comparisons more sensible. See article: FPS Versus Frame Time. How do you measure the “FPS” of a certain part of a frame? It just doesn't make sense. For this reason, almost all reputable academic papers use time units.

Its entirely possible that glGenerateMipmap is doing work client side( host) instead of server-side( GPU) to generate the mip maps. If this is indeed the case there will be a sync point to the server-host transfer( GPU-CPU) and the subsequent host-server ( CPU-GPU) transfer of the generated mips. I still can't think of the way to validate this at the moment, but have you tried this.
1. Create your cube map and allocate all the mipmaps up front.
2. Each frame render into each cubemap face as before.
3. For each cubemap face for mip level ≥ 1, bind to a FBO.
4. Use a simple ‘texture copy’ shader ( full screen quad) to draw into the render target, using the level 0 face as the source.
This would result in a mip-map downsampling as long a you set up your viewports correctly. The only caveat is that the your
source and destination will be same texture, albeit different mip-levels being read from and written to. This I believe is valid GL behavior.

Its a little more work, but it does keep all the work GPU side .


Thanks for the idea @cgrant. I think you are right about the CPU-GPU sync.

I tested out the same code on another machine with different GPU (Nvidia 1070) and don't observe any slowness there, so I'm thinking it is something specific to the GPU or driver on my main development machine. It's an old integrated intel GPU (HD 4600).

The likely storyline is that by calling glGenerateMipmap() I introduce a sync point, causing the GPU to have to execute all buffered commands up to that point. Since the GPU is slow this takes a long time (it's already running at only 10FPS without any cubemap stuff). This shows up in my CPU-side profiling as spending tons of time in glGenerateMipmap(), but I think it is just an artifact of how I measure time. If I measure the whole frame time it doesn't change that much when calling glGenerateMipmap().

cgrant said:
4. Use a simple ‘texture copy’ shader ( full screen quad) to draw into the render target, using the level 0 face as the source. This would result in a mip-map downsampling as long a you set up your viewports correctly.

The approach will miss to filter other faces of the cube on edges and corners, likely causing visible seams in reflections.
If glGenerateMipmap() can't be used, you likely have to care about this detail yourself. (Not sure if the GL function is specified to implement proper prefiltering anyway.)

Maybe using GPU profiling tools would give more insight on what's happening.

EDIT:

Oops, i think i'm wrong with my prefiltering worries. IIRC, there is no need to include other cube faces when generating mip maps. We only need to enable cube map filtering when sampling the texture afterwards.

JoeJ said:
there is no need to include other cube faces when generating mip maps

I think there is, if you want it to be seamless. That's what glEnable( GL_TEXTURE_CUBE_MAP_SEAMLESS ) is supposed to do. Without it you only get filtering within each face, which does produce visible seams at the cube edges. Ideally you would downsample the cube map using filtering over a solid angle, which would include neighboring faces at the borders. It would be something similar to what is done in this tutorial to generate irradiance cube maps (but without the blurring).

Aressera

__________

It seems you were given some more advice

let's start again:

_______________

fps is the number of frames per second

you don't need to have them over 24 frames per second

FPS is essentially a general indicator of graphics output power

that is, some 30 fps is enough when many people work and want 60 fps

but we'll see what FPS is in programming

and not during the actual operation of the program:

This is us dividing second by the number of frames (sufficient for us)

and a certain averaged number is obtained as a certain indicator

you can compare your microseconds with this indicator

because the computer does not work as you think

in some way it works seamlessly: including the processor and cache

and the interaction of data buses between the GPU and CPU and also Hash tables

___________

continuing:

in gamedev the scene is rendered for each frame

but in fact only the window that is the screen is rendered

that is, only the visible area at which the screen is looking is involved in rendering

this is just a certain part of the scene with objects: and not all at all

___________

and you can have previously created objects in *.obj format

where "obj" is the standard format for engine objects and so on

and these objects are in some way not generated internally, namely

are loaded and rendered as something more compact: industry

___________

and now about OpenGL and transfer between CPU and GPU

somewhere there is a command buffer length limiter

sent to the GPU in one pass: if the length is exceeded

there will be either braking, or a crash, or artifacts

___________

I think you should have explained this to me and others

and not refer to pseudo-scientific articles of unknown nature

Alice Corp Ltd

@alice wolfraider I think you need to sit down and reconsider your communication style. I understand that English may not be your native language, but it's important to know the limits of your own abilities and knowledge. Your posts are incoherent and largely irrelevant to this thread. You talk down to people with more experience, and much of the advice you offer is plainly bad advice. Please come down off Mount Dunning-Kruger.

I'm on the fence whether you are a LLM or not. If so, you need to get better training data.

Aressera said:
Ideally you would downsample the cube map using filtering over a solid angle, which would include neighboring faces at the borders. It would be something similar to what is done in this tutorial to generate irradiance cube maps (but without the blurring).

Yeah, that's exactly what i had in mind when i've thought mip generation would need access to all faces.

I wonder if such prefiltering is done for reflection probes too, not only for diffuse. I guess no, since we would need multiple cube maps for multiple cone angles. Likely it's better to take multiple samples instead.

This topic is closed to new replies.

Advertisement