SimpleMath - a simplified wrapper for DirectXMath

posted in Shawn Hargreaves' Blog

Published January 08, 2013

//EDIT: There is a problem with the formatting of this post, but the full text can also be found properly formatted HERE.

SimpleMath, created by my colleague Chuck Walbourn, is a header file that wraps the DirectXMath SIMD vector/matrix math API with an easier to use C++ interface. It provides the following types, with similar names, methods, and operator overloads to the XNA Game Studio math API:

Vector2
Vector3
Vector4
Matrix
Color
Plane
Quaternion
Ray
BoundingSphere
BoundingBox

Download SimpleMath here

Why wrap DirectXMath?

DirectXMath provides highly optimized vector and matrix math functions, which take advantage of SSE SIMD intrinsics when compiled for x86/x64, or the ARM NEON instruction set when compiled for an ARM platform such as Windows RT or Windows Phone. The downside of being designed for efficient SIMD usage is that DirectXMath can be somewhat complicated to work with. Developers must be aware of correct type usage (understanding the difference between SIMD register types such as XMVECTOR vs. memory storage types such as XMFLOAT4), must take care to maintain correct alignment for SIMD heap allocations, and must carefully structure their code to avoid accessing individual components from a SIMD register. This complexity is necessary for optimal SIMD performance, but sometimes you just want to get stuff working without so much hassle!

Enter SimpleMath...

These types derive from the equivalent DirectXMath memory storage types (for instance Vector3 is derived from XMFLOAT3), so they can be stored in arbitrary locations without worrying about SIMD alignment, and individual components can be accessed without bothering to call SIMD accessor functions. But unlike XMFLOAT3, the Vector3 type defines a rich set of methods and overloaded operators, so it can be directly manipulated without having to first load its value into an XMVECTOR. Vector3 also defines an operator for automatic conversion to XMVECTOR, so it can be passed directly to methods that were written to use the lower level DirectXMath types.

If that sounds horribly confusing, the short version is that the SimpleMath types pretty much Just Work(TM) the way you would expect them to.

By now you must be wondering, where is the catch? And of course there is one. SimpleMath hides the complexities of SIMD programming by automatically converting back and forth between memory and SIMD register types, which tends to generate additional load and store instructions. This can add significant overhead compared to the lower level DirectXMath approach, where SIMD loads and stores are under explicit control of the programmer.

Who is SimpleMath for?

You should use SimpleMath if you are:

Looking for a C++ math library with similar API to the C# Microsoft.Xna.Framework types
Porting existing XNA code from C# to C++
Wanting to optimize for programmer efficiency (simplicity, readability, development speed) at the expense of runtime efficiency

You should go straight to the underlying DirectXMath API if you:

Want to create the fastest possible code
Enjoy the lateral thinking sometimes needed to express an algorithm in terms of SIMD operations

This need not be a global either/or decision. The SimpleMath types know how to convert themselves to and from the corresponding DirectXMath types, so it is easy to mix and match. You can use SimpleMath for the parts of your program where readability and development time matter most, then drop down to DirectXMath for performance hotspots where runtime efficiency is more important.

Example

Here is a simple object movement calculation, implemented using DirectXMath. Note the skullduggery to make sure the PlayerCat instance will always be 16 byte aligned (and I didn't even include the implementation of the AlignedNew helper here!)

 #include <DirectXMath.h>using namespace DirectX;__declspec(align(16)) class PlayerCat : public AlignedNew<PlayerCat>{public:void Update(){const float cFriction = 0.99f;XMVECTOR pos = XMLoadFloat3A(&mPosition);XMVECTOR vel = XMLoadFloat3A(&mVelocity);XMStoreFloat3A(&mPosition, pos + vel);XMStoreFloat3A(&mVelocity, vel * cFriction);}private:XMFLOAT3A mPosition;XMFLOAT3A mVelocity;};

Using SimpleMath, the same math is, well, a little more simple :-)

 #include "SimpleMath.h"using namespace DirectX::SimpleMath;class PlayerCat{public:void Update(){const float cFriction = 0.99f;mPosition += mVelocity;mVelocity *= cFriction;}private:Vector3 mPosition;Vector3 mVelocity;};

Here is the x86 SSE code generated for the DirectXMath version of the Update method:

 movaps xmm2,xmmword ptr [ecx+10h]movaps xmm1,xmmword ptr [ecx]andps xmm2,xmmword ptr [?g_XMMask3@DirectX@@3UXMVECTORI32@1@B]andps xmm1,xmmword ptr [?g_XMMask3@DirectX@@3UXMVECTORI32@1@B]movaps xmm0,xmmword ptr [__xmm@3f7d70a43f7d70a43f7d70a43f7d70a4]addps xmm1,xmm2mulps xmm0,xmm2movq mmword ptr [ecx],xmm1shufps xmm1,xmm1,0AAhmovss dword ptr [ecx+8],xmm1movq mmword ptr [ecx+10h],xmm0shufps xmm0,xmm0,0AAhmovss dword ptr [ecx+18h],xmm0ret

The SimpleMath version generates slightly more than twice as many machine instructions:

 movss xmm2,dword ptr [ecx]movss xmm0,dword ptr [ecx+4]movss xmm1,dword ptr [ecx+0Ch]unpcklps xmm2,xmm0movss xmm0,dword ptr [ecx+8]movlhps xmm2,xmm0movss xmm0,dword ptr [ecx+10h]unpcklps xmm1,xmm0movss xmm0,dword ptr [ecx+14h]movlhps xmm1,xmm0addps xmm2,xmm1movss dword ptr [ecx],xmm2movaps xmm0,xmm2shufps xmm0,xmm2,55hmovss dword ptr [ecx+4],xmm0shufps xmm2,xmm2,0AAhmovss dword ptr [ecx+8],xmm2movss xmm1,dword ptr [ecx+0Ch]movss xmm0,dword ptr [ecx+10h]unpcklps xmm1,xmm0movss xmm0,dword ptr [ecx+14h]movlhps xmm1,xmm0mulps xmm1,xmmword ptr [__xmm@3f7d70a43f7d70a43f7d70a43f7d70a4]movaps xmm0,xmm1movss dword ptr [ecx+0Ch],xmm1shufps xmm0,xmm1,55hshufps xmm1,xmm1,0AAhmovss dword ptr [ecx+10h],xmm0movss dword ptr [ecx+14h],xmm1ret

Most of this difference is because I was able to used aligned loads and stores in the DirectXMath version, while the SimpleMath code must do extra work to handle memory locations that might not be properly aligned. Also note how the SimpleMath version loads the mVelocity value from memory into SIMD registers twice, while the extra control offered by DirectXMath allowed me to do this just once.

But hey, sometimes performance isn't the most important goal. If you care more about optimizing for developer efficiency, SimpleMath could be for you.

Resources

http://blogs.msdn.com/b/chuckw/archive/2012/03/27/introducing-directxmath.aspx

http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-sse-sse2-and-arm-neon.aspx

http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-sse3-and-ssse3.aspx

http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-sse4-1-and-sse-4-2.aspx

http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-avx.aspx

http://blogs.msdn.com/b/chuckw/archive/2012/09/11/directxmath-f16c-and-fma.aspx

Source

Previous Entry A brewing puzzle: solution

Next Entry Where should SimpleMath live?

0 likes 0 comments

Comments

Nobody has left a comment. You can be the first!

You must log in to join the conversation.

Don't have a GameDev.net account? Sign up!

shawnhar

Author

SimpleMath - a simplified wrapper for DirectXMath

Why wrap DirectXMath?

Who is SimpleMath for?

Example

Resources

Comments

shawnhar

Latest Entries

What Shawn’s been up to recently – PIX on Windows

Win2D / ANGLE team engineering process

Win2D / ANGLE team engineering process

Channel 9 - OpenGLES on Windows with ANGLE

Channel 9

Stuart: Shawn's Terrific Universal App for photogRaph Tweaking

Stuart: Shawn

Visual Studio template for cross-platform OpenGL development

Visual Studio template for cross-platform OpenGL development

WRL implementations of IVector and IAsyncOperation

SimpleMath - a simplified wrapper for DirectXMath

Why wrap DirectXMath?

Who is SimpleMath for?

Example

Resources

Comments

shawnhar

Latest Entries

What Shawn’s been up to recently – PIX on Windows

Win2D / ANGLE team engineering process

Win2D / ANGLE team engineering process

Channel 9 - OpenGLES on Windows with ANGLE

Channel 9

Stuart: Shawn&#39;s Terrific Universal App for photogRaph Tweaking

Stuart: Shawn

Visual Studio template for cross-platform OpenGL development

Visual Studio template for cross-platform OpenGL development

WRL implementations of IVector and IAsyncOperation

Reticulating splines

Stuart: Shawn's Terrific Universal App for photogRaph Tweaking