How to multithread, conceptually

Started by
12 comments, last by frob 1 year, 5 months ago

frob said:
In games usually there are too few people who have the skills to implement them, those who know how to properly partition the problems and map them to a parallel solution. Typically those people will implement a basic task-based system and throw them to the masses.

Yeah, but it should not be like this. And i do not agree with the general assumption that MT is hard, and only experts could do it without bugs, which would be almost impossible to find.
No. It's not hard, but is how modern HW works. So everybody should use it. And to get there it should be easy to use. Now i don't say C++ threading is not enough, but features like std::async are mostly useless because they generate a new thread for every async task. (though, maybe i got some things wrong.) The current features are mostly high level concepts, making the topic appearing more complex than it is because you need to understand the concepts first, and compromising performance as well.
A simple and standard job system would be a much better tool. (Afaik this is announced for future standards, but not there yet or i've missed it.)

Basically what i want is to make parallel programming the standard way of programming, instead keeping it a matter of experts and seniors only. HW has changed, programmers and languages need to change too.
On CPU the situation is not that bad at all, but on GPU it could not be any worse. We still have no cross platform / cross vendor standard, although even phones have GPUs. Everybody has the HW, but only a minority of programmers actually uses it.

Now we could argue that parallel programming is rarely applicable, so there is no need to make it more accessible.
But to me it feels more like the opposite: I could use it for almost everything i do, but due to obstacles, i do it only for the important things.
And the obstacles are not my failure to imagine what side effects multiple programs running at the same time might cause, the obstacles are only because parallel programming is hardly accessible as a tool.

Advertisement

I've been struggling with this for my metaverse client. This is a rewrite of the client for Second Life / Open Simulator. I''m using Rend3 → WGPU → Vulkan. Everything is retained mode. The concurrency approach is that there are

  • The render thread. This just calls Rend3's render function over and over. Highest priority thread.
  • The update thread. This takes in messages from the server and updates the world.
  • The movement thread. For things that are moving, this advances their positions and runs skeleton animations on a per-frame basis. It's unblocked when the render thread has started to render a frame, so it runs in parallel with rendering, queuing up transform matrix updates to be applied all at once between frames.
  • The asset loader threads. These are fetching assets from the asset servers, decompressing them, and sending out updates to objects when an asset arrives.

The “game logic” is server side, and is completely asynchronous to the client The client has to smooth out motion between update messages from the server. Because this is a big-world system with no preloaded content, there are a lot of object updates and content coming in. It's all about managing a flood of incoming content while keeping the frame rate up. So, the render thread waits for almost nothing.

The big problem right now is that Rend3 has an internal locking problem. It's supposed to be possible to load new meshes and textures into the GPU during rendering. Vulkan and WGPU support that, but Rend3 stalls due to a global lock. This is supposed to be fixed in a later version of Rend3.

Perhaps a difficulty here is that not all multithreading is the same.

There is parallel processing with trivially parallel tasks, things like running graphics shaders or running unrelated processing tasks. These can be done on any core in any order without regard to whatever else is going on, and don't require a lot of thought.

There is parallel processing with dependencies and communications in tasks. These require scheduling, they require data communication paths, they can suffer from issues regarding data flows and deadlocks and livelocks. Usually these require some specialty skills to understand the flow mapping and backfilling of work, but once you've got a good library in place they are easily managed.

There is parallel processing with cooperative work in tasks. These often involve scatter and gather techniques, partitioning problem spaces, potentially offer gains like superlinear speedup, and encounter harder problems like cache invalidation and resource contention.

There is no universal multithreading system when it comes to work loads, the work being done and the results expected vary tremendously.

Along the same lines, not all multiprocessing hardware is the same, nor do operating systems handle it the same way.

When it comes to hardware there is parallel hardware like multi-core chips commonly seen on desktops, a single CPU with multiple cores using a hierarchy of shared caches. There is also hardware with multiple chips, there are plenty of server-class and occasionally premium desktop computers that support multiple physical CPU chips each with their own sets of multiple cores, each physical CPU having it's own hierarchy of caches, with automatic communication between them facilitated by the operating systems. There are larger computers, server farms, clusters, and bigger systems with many independent and interdependent buses, some connected directly by a motherboard, some connected through direct interlinks, some connected by network switches, all of them treated as a single computer.

These are just a few reasons why there are so many parallel processing libraries and multithreading libraries, because so many systems do them differently. There is no universal multithreading system when it comes to hardware nor operating systems, the underlying calls and structure vary tremendously.

Standardization is hard, especially when it comes to language standards. It is far easier, and far more powerful, to have the functionality provided by libraries and tools that are more local to the problem.

When you're talking about C++ programming standards and standard libraries you're talking about systems that work on desktop computers, mainframes, large industrial supercomputers, space satellites, FPGAs and DSP boards and various tiny microcontrollers. These considerations are big ones when it comes to what gets included in the language standards, and it gets mentioned quite frequently in standards discussions.

Getting back more concretely to the questions about GPU multithreading, it's quite important to recognize there are two sides; there is the CPU side which is doing one set of work, there is the GPU side which is doing another set of work, and there is a layer of the drivers and the hardware bus that serve as a gateway between the two.

You can (and should) always keep in mind that the two sides are independent. What you're doing on your CPU side should be free to move around independently of what you're doing on your GPU side. Not only that, but in consumer hardware the systems are unrelated. People have assorted CPUs from dual-core and quad-core CPUs up through 64-core CPUs. Similarly, people have assorted graphics cards ranging from on-board graphics to the latest NVidia or AMD graphics boards. The two are independent of each other, someone can have an amazing CPU and crappy GPU, or a crappy CPU but amazing GPU, or whatever other combination. The work done by each should be considered as though they're separate systems, because they really are separate systems.

This topic is closed to new replies.

Advertisement