11 hours ago, gustavo rincones said:
Since I created this post, I've only heard things I already knew, such as undefined behaviors, static members, trap representation in c ++ and modern compilers. I just want to see more efficient ways to calculate the square root.
I quickly reread everything without paying much attention to details, so sorry in advance if I missed something.
You shouldn't forget, that this is a forum for game developers. A topic like yours is better suited for a computer science forum. As I wrote in my previous posts, I doubt that it is really relevant for most games since the number of square root calculations is usually comparatively low. If it gets relevant, the fast math compiler flag is most certainly the chosen solution for such a low-level problem, especially if the alternatively presented ways violate c++ coding rules. I don't think that any nowadays game developer will spend a lot of time in optimizing a square root calculation. He will probably rely on the functionalities of the utilized math library, where math experts already pulled off any possible trick. Only if a benchmark recognizes that the square root is the reason for a major fps hit, he might get interested in alternative formulations. But in this case, he is probably looking for ways to remove the square root entirely.
Also, what your post is missing to make it interesting for many people here are hard numbers. You give some advantages and disadvantages but that doesn't tell much. Pick some specific examples, benchmark them and compare the resulting floating-point error. This is what most people here are interested in (if they are in the first place) and what they can work with. It will enable them to decide if one of the presented methods is a true alternative pick for them. Otherwise, they are just some formulas that aren't written in a c++ standard-compliant way. By the way, this explains why you got the answers you got.
11 hours ago, gustavo rincones said:
and since the creation of this publication, nobody has placed any sample code.
Don't get me wrong, I like this low-level stuff but I think that there are just a few people here that can actually provide code for this problem and are also interested in this low-level topic. Most people can only comment on the presented code since they had never tackled this specific problem because it was never necessary. Computer games are such complex beasts that you avoid spending time on such topics unless you really have to address them. Since there is the standard solution, which is pretty damned fast, why should one even bother, except of pure curiosity? Just take a look here:
https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_rsqrt_ps&expand=4803
The standard reciprocal square root for vector registers takes just 4 flops to be finished on a skylake processor. That's the same time as for a single addition or multiplication. Why should a game developer even try to optimize such a fast operation unless he makes a game about calculating square roots?
Long story short:
The problem you are addressing is not really a concern for nowadays game developers. It is more an interesting scientific problem. Hence, the chances that somebody here can really discuss this problem with you in the way you want are rather low.
Greetings and good luck
EDIT:
An additional piece of information:
http://www.tommesani.com/index.php/component/content/article/2-simd/61-sse-reciprocal.html
Just had a short look, but it seems that the SSE reciprocal square root is hardware accelerated ("These instructions are implemented via hardware lookup tables"). So you have probably no real chance to get much faster than that. Only a better accuracy to performance ratio might be the reason to pick an alternative function.
EDIT2:
You can compare performance and accuracy of your implementations against this function:
#include <x86intrin.h>
float FastInvSqrt(float val)
{
__m128 reg = _mm_set1_ps(val);
return _mm_cvtss_f32(_mm_rsqrt_ss(reg));
}
It uses the hardware-accelerated vector register function to calculate the reciprocal square root. Haven't benchmarked it myself (maybe I'll do it later), but I think you will see, that there is not really a point in trying to implement an own square root (at least for the reciprocal) as a game developer. If you need accuracy, you will use the standard `sqrt`. If you need raw speed, you will probably use the hardware-accelerated version unless it is not available on your processor.