🎉 Celebrating 25 Years of GameDev.net! 🎉

Not many can claim 25 years on the Internet! Join us in celebrating this milestone. Learn more about our history, and thank you for being a part of our community!

Back to Engines and Middleware

OpenGL loading in a second thread causing stutter in main thread?

CDRZiltoid · 2021-01-19T07:24:14

So I went and did it. I added a secondary OpenGL context to my project so I can load scenes in a separate thread from main. It actually wasn't too big of a deal to implement what I have so far. Basically on initial load I pull the scene into the main thread, then any scene load after that is pushed off to a secondary thread with a separate OpenGL context (made current on the executing thread). Once the secondary thread has finished, I then swap the context used by the main thread with the context that was used on the secondary thread. This allows a scene to remain responsive to a user while they are in game, while the next scene loads in the background. However, I've noticed that while the active scene does remain responsive, there is a rather severe intermittent stutter/jump. This is what I'm currently attempting to track down a solution for if one exists and I had a question I was hoping someone more knowledgeable with opengl than myself might be able to answer. Since this loading is happening in a separate thread, I would think that the main thread should continue to execute at the same rate without being affected by the processing in the secondary thread, however since that's not the case, the only thing I could think of that may be causing this would be if the secondary thread was locking some memory that the main thread was attempting to access for drawing, causing the main thread to have to wait. After reviewing my code, I could not find any cases where I'm explicitly accessing the same memory values between threads, EXCEPT MAYBE the opengl calls as I'm not well versed with what's actually going on under the hood when these calls are made. For example, lets say my main thread which is working off it's context is calling ‘glDrawElements’ every tick, and my secondary thread is generating and loading new buffers in it's own context via ‘glGenVertexArrays’, ‘glBindVertexArray’, ‘glBufferData’, etc. Would the fact that my secondary thread is loading data into a separate context from my main thread still have an effect on my main thread, potentially causing the issue described above? If not, does anyone have any other ideas as to what may be causing this? Thanks! ?

Engines and Middleware Programming OpenGL C++

Started by CDRZiltoid January 05, 2021 05:34 PM

20 comments, last by wintertime 3 years, 5 months ago

mr.otakhi

January 05, 2021 07:58 PM

well, no buffer upload doesn't mean you're not communicating with GPU. Perhaps put a timer around your main thread and see what happens. gluniform calls could lead to scene stall as well since it force a CPU/GPU sync.

https://developer.nvidia.com/sites/default/files/akamai/gameworks/events/gdc14/AvoidingCatastrophicPerformanceLoss.pdf

Let me correct myself, glUniform does not create sync point but enter command to a queue. Must be somewhere else.

CDRZiltoid

Author

January 07, 2021 04:55 AM

mr.otakhi said:
The bottleneck could be the CPU/GPU communication. BindBuffer and bufferData call. If you only have one GPU, both main thread and secondary thread are sharing that single communication pipe.

This is what I'm starting to suspect. But just to confirm I'm thinking clearly on this, say my main thread is looping calling glDrawElements and my secondary thread is loading data from a large image file to a texture buffer. While the secondary thread is sending the texture data to the gpu, would any call being made to glDrawElements then be slowed down because of the saturation of the pipe between cpu/gpu? Similar to say trying to browse the internet while you're uploading a large file that's maxing out your upload bandwidth?

cgrant

1,875

January 07, 2021 01:49 PM

whitwhoa said:
mr.otakhi said:
The bottleneck could be the CPU/GPU communication. BindBuffer and bufferData call. If you only have one GPU, both main thread and secondary thread are sharing that single communication pipe.
This is what I'm starting to suspect. But just to confirm I'm thinking clearly on this, say my main thread is looping calling glDrawElements and my secondary thread is loading data from a large image file to a texture buffer. While the secondary thread is sending the texture data to the gpu, would any call being made to glDrawElements then be slowed down because of the saturation of the pipe between cpu/gpu? Similar to say trying to browse the internet while you're uploading a large file that's maxing out your upload bandwidth?

I think if that was the case, every single OpenGL app that uses streaming would end up having the exact same issue you are having. My application uses a shared context to for resource upload in a dedicated thread and I have not experience the issue you are seeing. The application stream terrain elevation and color texture data as you move around, as well as mesh geometry etc. An OpenGL context should have its own command queue so to speak. While the dispatching may not be parallel as that is up to the driver implementation( but lets assume its not), OpenGL still has other means to transfer data asynchronously, to avoid the said issue you are describing above.
In the case you mention above:
1. Is that texture being used in any current draw call? Huge issue if it is.
2. How are you guaranteeing that this texture is finished uploading before using on the main context. If the you are depending on the gl* command returning then this is flawed.
You posted the link to your GPU class, but that in itself is not enough as there is still not enough info to tell how that class is being used. The main application code that uses that class to do the per frame draw will be needed.

CDRZiltoid

Author

January 07, 2021 02:21 PM

cgrant said:
I think if that was the case, every single OpenGL app that uses streaming would end up having the exact same issue you are having. My application uses a shared context to for resource upload in a dedicated thread and I have not experience the issue you are seeing. The application stream terrain elevation and color texture data as you move around, as well as mesh geometry etc. An OpenGL context should have its own command queue so to speak. While the dispatching may not be parallel as that is up to the driver implementation( but lets assume its not), OpenGL still has other means to transfer data asynchronously, to avoid the said issue you are describing above.
In the case you mention above:
1. Is that texture being used in any current draw call? Huge issue if it is.
2. How are you guaranteeing that this texture is finished uploading before using on the main context. If the you are depending on the gl* command returning then this is flawed.
You posted the link to your GPU class, but that in itself is not enough as there is still not enough info to tell how that class is being used. The main application code that uses that class to do the per frame draw will be needed.

Thanks for clarifying that! To answer your questions:

The textures being uploaded on the load thread are not currently used on the draw call, each “Scene” is it's own encapsulated entity and the context being used for it does not know of the context being used on the main draw thread or any of it's elements
You are correct in that I am not currently checking to insure the texture is finished uploading before using on the main context. I am simply depending on the gl* command returning. However, once I switch the contexts the issue is gone and I'm not seeing any artifacts or incomplete textures (which doesn't necessarily mean there are none, so this is something I'll probably need to look into as well)

The full engine code can be found in this github repo/branch. However that is just the core engine code, I will need to upload a simple usage example as the current project would be too unwieldly to navigate. I will work on that and update this post once I have something posted.

*EDIT*

I have uploaded a stripped down version of my existing project which simply includes main() as a bootstrapper and the two `Scenes` that are being used to switch between. Within each scene class note the outerLoop() member where vel::App::get().loadNextScene() is called. This is the method which initiates the secondary thread for loading the next scene.

cgrant

1,875

January 07, 2021 02:35 PM

I will check out the codebase or the example when posted when I get a chance. Also to clarify my previous question #2, this only matters if the textures are ‘create/modified’ in different context. If the are within the same context then the OpenGL will ensure the commands are dispatched in the order submitted, meaning that the texture will be ‘complete’ by the time any subsequent GL call that uses the said texture was submitted. In addition is you are calling make the context current every frame or even periodically ( wglMakeCurrent, glxMakeCurrent), you are going to seem hiccups in your frame as the current context will flush the previous context per the docs for WGL: https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-wglmakecurrent. The main idea is that this really should be a one time deal. This in turn forces one to thing more about the intent of their application on how to structure it to abide by that limitation.

CDRZiltoid

Author

January 07, 2021 03:06 PM

cgrant said:
I will check out the codebase or the example when posted when I get a chance. Also to clarify my previous question #2, this only matters if the textures are ‘create/modified’ in different context. If the are within the same context then the OpenGL will ensure the commands are dispatched in the order submitted, meaning that the texture will be ‘complete’ by the time any subsequent GL call that uses the said texture was submitted. In addition is you are calling make the context current every frame or even periodically ( wglMakeCurrent, glxMakeCurrent), you are going to seem hiccups in your frame as the current context will flush the previous context per the docs for WGL: https://docs.microsoft.com/en-us/windows/win32/api/wingdi/nf-wingdi-wglmakecurrent. The main idea is that this really should be a one time deal. This in turn forces one to thing more about the intent of their application on how to structure it to abide by that limitation.

Updated post above with link to a stripped down version which shows basic usage example to go along with the engine code.

To clarify, I am only switching between contexts once in order to load new scenes. For example, think of a loading screen that's showing a user a progress bar or some animation which needs to remain responsive while the next scene is loaded in the background. Once the main thread context has been switched with the loading thread's context, no further context switching (or asset loading) of any kind is happening (until the next scene load is initiated). I may not have made that fact clear in the beginning.

If you feel like checking out the engine/example code that would be wonderful, if not though no worries. I know digging through someone elses code isn't the most pleasant thing to do sometimes. Greatly appreciate the assistance up to this point.

If you do decide to take a look, know that this is by no means a final solution, especially this branch as I cut it simply to play around with the multiple context loading and wound up having to implement some things I'm not so proud of just to see it work. I plan on refactoring once I have this issue figured out.

CDRZiltoid

Author

January 16, 2021 03:11 AM

To anyone who might stumble across this in the future, my issue turned out to be that I was auto generating my mipmap levels in this second thread which I was doing by calling glGenerateMipmap(). This turns out to be quite an expensive operation, therefore when it's called even from a separate thread (since you only have one gpu) it taxes the gpu enough to drastically degrade overall performance.

I solved this by manually creating the mipmaps for each texture and loading them in with subsequent calls to `glTexImage2D()`.

swiftcoder

18,997

January 16, 2021 07:40 AM

whitwhoa said:
This turns out to be quite an expensive operation, therefore when it's called even from a separate thread (since you only have one gpu) it taxes the gpu enough to drastically degrade overall performance.

I'm a little surprised by that, as GPU generation of mipmaps should be fast - it's a trivial box filter by default.

I wonder whether the stutters are actually caused by lock contention in the GPU driver between the two different threads. Have you tried moving the glGenerateMipmaps calls to the main thread, after loading is complete? That would eliminate the possibility of weird thread handling in the driver.

Tristam MacDonald. Ex-BigTech Software Engineer. Future farmer. [https://trist.am]

mr.otakhi

January 18, 2021 02:10 AM

Perhaps WebGPU tries to solve this very problem for multi-threaded apps running on multi-core CPU. Clearly, WebGL's implementation lacks of the fine granularity for better parallelism.

https://hacks.mozilla.org/2020/04/experimental-webgpu-in-firefox/

What version of OpenGL are you working with? OpenGL 3.0 or below that WebGL 2.0 derived from seem to have this issues.

Overall, this separation will allow for complex applications on the web to stream data in one or more workers and create any associated GPU resources for it on the fly. Meanwhile, the same application could be recording work on multiple workers, and eventually submit it all together to GPUQueue. This matches multi-threading scenarios of native graphics-intensive applications and allows for high utilization of multi-core processors.

JohnnyCode

1,084

January 18, 2021 09:04 AM

swiftcoder said:
whitwhoa said:
This turns out to be quite an expensive operation, therefore when it's called even from a separate thread (since you only have one gpu) it taxes the gpu enough to drastically degrade overall performance.
I'm a little surprised by that, as GPU generation of mipmaps should be fast - it's a trivial box filter by default.

I am stabbing in the dark but OP may have not front declared the texture as one with mip levels? In that case, memory copying and expanding takes place.

🎉 Celebrating 25 Years of GameDev.net! 🎉

OpenGL loading in a second thread causing stutter in main thread?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

🎉 Celebrating 25 Years of GameDev.net! 🎉

OpenGL loading in a second thread causing stutter in main thread?

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines