🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

OpenGL loading in a second thread causing stutter in main thread?

Started by
20 comments, last by wintertime 3 years, 5 months ago

well, no buffer upload doesn't mean you're not communicating with GPU. Perhaps put a timer around your main thread and see what happens. gluniform calls could lead to scene stall as well since it force a CPU/GPU sync.

https://developer.nvidia.com/sites/default/files/akamai/gameworks/events/gdc14/AvoidingCatastrophicPerformanceLoss.pdf​

Let me correct myself, glUniform does not create sync point but enter command to a queue. Must be somewhere else.

Advertisement

mr.otakhi said:

The bottleneck could be the CPU/GPU communication. BindBuffer and bufferData call. If you only have one GPU, both main thread and secondary thread are sharing that single communication pipe.

This is what I'm starting to suspect. But just to confirm I'm thinking clearly on this, say my main thread is looping calling glDrawElements and my secondary thread is loading data from a large image file to a texture buffer. While the secondary thread is sending the texture data to the gpu, would any call being made to glDrawElements then be slowed down because of the saturation of the pipe between cpu/gpu? Similar to say trying to browse the internet while you're uploading a large file that's maxing out your upload bandwidth?

whitwhoa said:

mr.otakhi said:

The bottleneck could be the CPU/GPU communication. BindBuffer and bufferData call. If you only have one GPU, both main thread and secondary thread are sharing that single communication pipe.

This is what I'm starting to suspect. But just to confirm I'm thinking clearly on this, say my main thread is looping calling glDrawElements and my secondary thread is loading data from a large image file to a texture buffer. While the secondary thread is sending the texture data to the gpu, would any call being made to glDrawElements then be slowed down because of the saturation of the pipe between cpu/gpu? Similar to say trying to browse the internet while you're uploading a large file that's maxing out your upload bandwidth?

I think if that was the case, every single OpenGL app that uses streaming would end up having the exact same issue you are having. My application uses a shared context to for resource upload in a dedicated thread and I have not experience the issue you are seeing. The application stream terrain elevation and color texture data as you move around, as well as mesh geometry etc. An OpenGL context should have its own command queue so to speak. While the dispatching may not be parallel as that is up to the driver implementation( but lets assume its not), OpenGL still has other means to transfer data asynchronously, to avoid the said issue you are describing above.
In the case you mention above:
1. Is that texture being used in any current draw call? Huge issue if it is.
2. How are you guaranteeing that this texture is finished uploading before using on the main context. If the you are depending on the gl* command returning then this is flawed.
You posted the link to your GPU class, but that in itself is not enough as there is still not enough info to tell how that class is being used. The main application code that uses that class to do the per frame draw will be needed.

cgrant said:

I think if that was the case, every single OpenGL app that uses streaming would end up having the exact same issue you are having. My application uses a shared context to for resource upload in a dedicated thread and I have not experience the issue you are seeing. The application stream terrain elevation and color texture data as you move around, as well as mesh geometry etc. An OpenGL context should have its own command queue so to speak. While the dispatching may not be parallel as that is up to the driver implementation( but lets assume its not), OpenGL still has other means to transfer data asynchronously, to avoid the said issue you are describing above.
In the case you mention above:
1. Is that texture being used in any current draw call? Huge issue if it is.
2. How are you guaranteeing that this texture is finished uploading before using on the main context. If the you are depending on the gl* command returning then this is flawed.
You posted the link to your GPU class, but that in itself is not enough as there is still not enough info to tell how that class is being used. The main application code that uses that class to do the per frame draw will be needed.

Thanks for clarifying that! To answer your questions:

  1. The textures being uploaded on the load thread are not currently used on the draw call, each “Scene” is it's own encapsulated entity and the context being used for it does not know of the context being used on the main draw thread or any of it's elements
  2. You are correct in that I am not currently checking to insure the texture is finished uploading before using on the main context. I am simply depending on the gl* command returning. However, once I switch the contexts the issue is gone and I'm not seeing any artifacts or incomplete textures (which doesn't necessarily mean there are none, so this is something I'll probably need to look into as well)

The full engine code can be found in this github repo/branch. However that is just the core engine code, I will need to upload a simple usage example as the current project would be too unwieldly to navigate. I will work on that and update this post once I have something posted.

*EDIT*

I have uploaded a stripped down version of my existing project which simply includes main() as a bootstrapper and the two `Scenes` that are being used to switch between. Within each scene class note the outerLoop() member where vel::App::get().loadNextScene() is called. This is the method which initiates the secondary thread for loading the next scene.

I will check out the codebase or the example when posted when I get a chance. Also to clarify my previous question #2, this only matters if the textures are ‘create/modified’ in different context. If the are within the same context then the OpenGL will ensure the commands are dispatched in the order submitted, meaning that the texture will be ‘complete’ by the time any subsequent GL call that uses the said texture was submitted. In addition is you are calling make the context current every frame or even periodically ( wglMakeCurrent, glxMakeCurrent), you are going to seem hiccups in your frame as the current context will flush the previous context per the docs for WGL: https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-wglmakecurrent.​ The main idea is that this really should be a one time deal. This in turn forces one to thing more about the intent of their application on how to structure it to abide by that limitation.

cgrant said:

I will check out the codebase or the example when posted when I get a chance. Also to clarify my previous question #2, this only matters if the textures are ‘create/modified’ in different context. If the are within the same context then the OpenGL will ensure the commands are dispatched in the order submitted, meaning that the texture will be ‘complete’ by the time any subsequent GL call that uses the said texture was submitted. In addition is you are calling make the context current every frame or even periodically ( wglMakeCurrent, glxMakeCurrent), you are going to seem hiccups in your frame as the current context will flush the previous context per the docs for WGL: https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-wglmakecurrent.​ The main idea is that this really should be a one time deal. This in turn forces one to thing more about the intent of their application on how to structure it to abide by that limitation.

Updated post above with link to a stripped down version which shows basic usage example to go along with the engine code.

To clarify, I am only switching between contexts once in order to load new scenes. For example, think of a loading screen that's showing a user a progress bar or some animation which needs to remain responsive while the next scene is loaded in the background. Once the main thread context has been switched with the loading thread's context, no further context switching (or asset loading) of any kind is happening (until the next scene load is initiated). I may not have made that fact clear in the beginning.

If you feel like checking out the engine/example code that would be wonderful, if not though no worries. I know digging through someone elses code isn't the most pleasant thing to do sometimes. Greatly appreciate the assistance up to this point.

If you do decide to take a look, know that this is by no means a final solution, especially this branch as I cut it simply to play around with the multiple context loading and wound up having to implement some things I'm not so proud of just to see it work. I plan on refactoring once I have this issue figured out.

To anyone who might stumble across this in the future, my issue turned out to be that I was auto generating my mipmap levels in this second thread which I was doing by calling glGenerateMipmap(). This turns out to be quite an expensive operation, therefore when it's called even from a separate thread (since you only have one gpu) it taxes the gpu enough to drastically degrade overall performance.

I solved this by manually creating the mipmaps for each texture and loading them in with subsequent calls to `glTexImage2D()`.

whitwhoa said:
This turns out to be quite an expensive operation, therefore when it's called even from a separate thread (since you only have one gpu) it taxes the gpu enough to drastically degrade overall performance.

I'm a little surprised by that, as GPU generation of mipmaps should be fast - it's a trivial box filter by default.

I wonder whether the stutters are actually caused by lock contention in the GPU driver between the two different threads. Have you tried moving the glGenerateMipmaps calls to the main thread, after loading is complete? That would eliminate the possibility of weird thread handling in the driver.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

Perhaps WebGPU tries to solve this very problem for multi-threaded apps running on multi-core CPU. Clearly, WebGL's implementation lacks of the fine granularity for better parallelism.

https://hacks.mozilla.org/2020/04/experimental-webgpu-in-firefox/

What version of OpenGL are you working with? OpenGL 3.0 or below that WebGL 2.0 derived from seem to have this issues.

Overall, this separation will allow for complex applications on the web to stream data in one or more workers and create any associated GPU resources for it on the fly. Meanwhile, the same application could be recording work on multiple workers, and eventually submit it all together to GPUQueue. This matches multi-threading scenarios of native graphics-intensive applications and allows for high utilization of multi-core processors.

swiftcoder said:

whitwhoa said:
This turns out to be quite an expensive operation, therefore when it's called even from a separate thread (since you only have one gpu) it taxes the gpu enough to drastically degrade overall performance.

I'm a little surprised by that, as GPU generation of mipmaps should be fast - it's a trivial box filter by default.

I am stabbing in the dark but OP may have not front declared the texture as one with mip levels? In that case, memory copying and expanding takes place.

This topic is closed to new replies.

Advertisement