Software rendering resources and optimization

Graphics and GPU Programming Programming simd softwarerendering multithreading optimziation cpurendering

Started by karimHamdallah September 13, 2021 08:06 PM

22 comments, last by Geri 2 years, 7 months ago

September 17, 2021 07:09 PM

Great post. Yes I do recommed doing single thread case initially. It will be helpful in learning about profiling, and understanding how memory access patterns affects the speed of CPU. You can start with single thread and then move on.

Nagle

September 17, 2021 08:48 PM

If you're learning, start single-thread. If you're building something big, though, start multi-thread, because retrofitting multi-thread is really hard and likely to result in hard to find race conditions.

theo27

September 20, 2021 10:26 PM

I developed 3D software engines in the 90's then moved on to DX/OpenGL/Vulkan.

I want to learn 2d/3d software rendering
My problem now is how to optimize this small lib algorithms before going 3D
but Im asking for SIMD and multithreading resources to utilise newer tech and benifit from it in addition to making software renderer

If you want to learn 3d software rendering, don't even think about optimisations at the start. To do a software engine is very difficult, so I would focus on that first. If you ever get it rendering 3D models + materials + lighting + depth buffering + cameras + clipping etc… then you can start profiling the code to see where the bottle necks are.

If you get to the optimisation stage, I would read Michael Abrash's Graphics Programming Black Book ( Quake) before doing anything.

RomanGenkhel

September 21, 2021 11:52 AM

@karimhamdallah Did you try https://www.openswr.org/ ?

None

Geri

407

September 30, 2021 06:08 PM

@JoeJ two things told this away. first was, when i switched to a framebuffer code that was far simpler and far less cycles (this was aligned to 32 bit instead of 24 bit bitwise magic). i expected some speed gain from the simpler pixel code, but instead, the speed fell by 15-20% or something, due to larger memory area is being read and written.

of course i have reverted to the original algo after this.

i bought a cpu later, which had 66% less cache, but it was about the same clock speed (2 ghz, 3 mbyte L2 cache, 6 mbyte L3 cache VS 2.1 ghz, 2 MB L2 + 2 MB L3 cache), and architecturally approx the same. the half cache resulted a massive 50% speed drop.

so basically the simplest way to find this out is to buy cpus with various size of cache from the same architecture, or try adjusting the code for bigger/smaller memory pressure vs simpler/more complex code, to see which direction the speed moves, to find out if the cache is a bottleneck or not. for me, i went mostly with the more thight memory representations and hammering the alu a bit more, which gave me the best result in overall in this scenario, in most of the cases.

Software rendering resources and optimization

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Software rendering resources and optimization

This topic is closed to new replies.

Popular Topics

Recommended Tutorials

Reticulating splines