Idk what issue you guys talk about currently, but modeling diffuse with just 3 cones is still optimistic, and i expect big error from that. You mentioned random rotation to improve this(?), but if this only causes the 3 directions to rotate around the normal, you basically sample a ‘circle’, not the whole halfspace. And you can't model cosine weighting at all.
I think the minimum would be 6 cones. One along the normal, the other five would form again a circle. But cosine weighting works: Normal cone has a weight of one, the others have a lesser weight of dot(normal, tracingDirection).
You could also do Monte Carlo like in path tracing, to break the circle shape and have just better sampling overall with any fixed number of cones.
I would do this using precomputed samples on a halfsphere, where samples are cosine weighted and also separated by poisson disk, so you do not sample directions which are too close. The sample disk could be still randomly rotated to hide the precomutated pattern.
Personally i've never done this in a progressive way, so you can accumulate with results from previous frames, although the scene is changing and camera is moving. Either you do the accumulation on voxels or in screenspace pixels (which then needs reprojection and TA)
But that's not hard to figure out i guess, and then you should also be able to sample the sky properly without relying on simplifying assumptions.
Not sure about temporal issues under motion, though.