I don't know how their solution work but I implemented my ThreadPool and the Task System different. I don't have one scheduling thread but (nearly)-lockless queues that are concurrent for multi-read-multi-write. Each worker has it's own queue while there is also a queue for pending jobs.
When starting a thread/ a task, the system decides which worker to give a task by an incremental counter. If the counter exceeds the number of cores/ workers acquired, it will swap back. This way the system ensures that work is spread over all cores equally and also uses the whole bandwidth of the system.
Worker threads that don't have any work anymore are trying to find another worker with ‘too much’ work and grab from it's job backlog instead. This technic is called ‘work stealing’ and should also ensure the balance of all threads in the job system.
If a worker is still out of work, it will look for pending jobs. I use fibers that each have their own stack and save CPU registers to memory pages. So a job truely runs on it's own and dedicated from other jobs. Jobs can return to the Scheduler if needed and get placed into the pending jobs queue until some conditions are met. Every worker can access the pending job queue any time to test the next job for getting rescheduled.
Finally, worker threads can change into idle state. They won't consume CPU power anymore unless there are still pending jobs waiting to be rescheduled and can be reactivated if new jobs appear in their queue.
I watched this GDC talk and did some experiments with other mechanisms like setjmp/ longjmp before but am finally happy with the fiber solution