Software rendering resources and optimization

Started by
22 comments, last by Geri 2 years, 7 months ago

Software renderers in the 90s didn't have or need SIMD. What they did have is:

  • Very low resolutions.
  • Very simple lighting model or no lighting model.
  • 8 bit indexed color.
  • In many cases, specialized rasterizers for floors and walls instead of or in addition to arbitrary polygon rasterizers.
  • Very skilled programmers who specialized in optimizing software renderers.

Very true, I like to add to that that also things were done different in the 80s and 90s.

For example the C64 had hardware sprites. There were also special hardware registers for smooth scrolling, and sometimes there were hardware layers, allowing for more tricks.


Amiga had a hardware blitter, great for moving 2d blocks of pixels around, without the CPU breaking a sweat.

But in those days, still alot of tricks were used to keep up with the 50 FPS. Since the CPU was only a few MHZ. (not GHZ)

Nowadays, the hardware is super capable, but not really targeted for making 2D 50FPS games. GPUs are mainly there for helping 3D rendering. (Of course you can “abuse” the GPU, and let it do other things, like training neural networks, or calculating 2d graphics)

Chao55 / Retro Games, Programming, AI, Space and Robots

Currently working on HyperPyxel paint program - http://www.hyperpyxel.com​​ and an asteroids clone under my "Game1" javascript game "engine".

Advertisement

Probably the most known post about SIMD SW render was about triangle rasterization, written by Nick Capen.
But i can't find it anymore. seems gone.
IIRC, it was about rendering blocks of triangles, and each pixel solved all 3 side of line equations. But because it was parallelized, it was faster than the traditional edge walk for scanlines.

I also remember a powerful SW renderer Pixomatic, aiming to work on Larrabee. Michael Abrash worked on it among others. Papers are easy to find, e.g. https://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/readings/abrash09_lrbrast.pdf​ Maybe this gives some good resources.​

Oh… i remember about a really good and recent resource. It was an open source SW renderer, fast enough to run games with pretty detailed graphics, and it was posted here on this forum. But can't give any name or search term. :( Project is on GitHub.

thanks so much for these information every body, I think I got it to make a decent rasterizer no way making it fully software, I should use the power of GPU … I think Im going to change the old rasterization algortihms(scanline) and using way more suitable for parallelism and use GPU threades using CUDA or OpenCL is that will be a good idea to get my rasterizer good enough to work with at least showing 2D graphics at reasonable FPS

I would suggest given what @shaarigan said some more thought. Before worrying about low-level optimization, I would worry about having a solid design down first then optimize. Do some research into how ‘modern’ GPU tackle the task, specifically I would look at how mobile GPU pipeline is structured as there are multiple tiers of high-level optimization that can be done prior to delving into the like of assembly etc. Most mobile GPU implement a tile-base rendering approach which I think would definitely suit your software renderer.

Unless you really want to write a game engine, or have some very unusual problem to solve, get some existing game engine and go with that. This is a problem you no longer have to solve yourself.

My software renderer have no simd code whatsoever. I dont even see how that could be usable outside boosting some calculations in the initial vertex transformation… Thats not even the bottleneck.

Geri said:
I dont even see how that could be usable outside boosting some calculations in the initial vertex transformation… Thats not even the bottleneck.

To benefit from SIMD, every pixel can map to one SIMD lane, so you can get 4 or more pixels with similar amount of instructions.
But like GPUs do, you may process pixels in blocks, and pixels in the block but outside the triangle are 'wasted'. Still a win if your triangles are not too small.
The other option would be mapping scanlines to SIMD rows of 4 pixels length, so the waste would reduce to one half. Not sure when and if the non optimal memory alignment matters then.

i think micromanaging the stuff to fit for a simd-relying algorithm, would be slower than doing the rasterization from raw muscle. i am getting around 100-200 fps in 1080p with 100k-200k polygons on a 6 core i7 and thats a relatively dumb but optimal classic c code with threads. the code already edges out the cache bandwidth as the most limiting factor so the simd wouldnt help.

in other hand i could use some trickery to incrase the polygon count, because the situation with that is not super perfect, above 200k poly, the speed starts to nose dive. but these are one year old results, since then i made some optimizations, maybe i already reached 1 million since i have made these measures. this is enough for me, because i typically use only a few 10k polygons on the scene, but would be problematic if i would wish to use more.

To answer the OPs original question: with the increased demands from the game industry in correct and beautiful 3D graphics came an increase in computation demand. New effects like per-pixel graphics computations, so called pixel-shaders, as well as increased screen resolutions are the main factors. But I would not say that decent software rendering is impossible in todays world. I am not the right person though if you want to ask about optimizational shortcuts of the early days that involved trickster-programming, taking a lot of shortcuts, not solving the math properly, going by gut feeling, etc.

Geri said:
i think micromanaging the stuff to fit for a simd-relying algorithm, would be slower than doing the rasterization from raw muscle. i am getting around 100-200 fps in 1080p with 100k-200k polygons on a 6 core i7 and thats a relatively dumb but optimal classic c code with threads. the code already edges out the cache bandwidth as the most limiting factor so the simd wouldnt help.

How do you measure cache BW is saturated? Is there some VS profiler option showing this? I'm a bit lazy with figuring such things out…

This topic is closed to new replies.

Advertisement