Debugging Direct3D programs: a taxonomy of error conditions

Published November 27, 2012
Advertisement
Eric Lippert wrote a fantastic article that categorizes C# exceptions into four categories:


  • Exogenous exceptions occur due to the messy nature of reality. Filesystems can run out of space. Network connections can drop. These things are rare, but they do happen and robust code needs to be ready to deal with them.

  • Boneheaded exceptions, as Eric puts it, "are your own darn fault, you could have prevented them and therefore they are bugs in your code. You should not catch them; doing so is hiding a bug in your code. Rather, you should write your code so that the exception cannot possibly happen in the first place".

  • Fatal exceptions occur when something truly unexpected went wrong (memory corruption, thread abort, etc.). Again in Eric's words, "not your fault, you cannot prevent them, and you cannot sensibly clean up from them. They almost always happen because the process is deeply diseased and is about to be put out of its misery".

  • Finally, Vexing exceptions are caused by design flaws where someone foolishly decided to throw an exception in response to a situation that was not truly exceptional. Sometimes you are forced to catch these in order to use the offending API, but ideally that API should be changed to report non-exceptional results in some other way (e.g. as return values).

At this point the attentive reader may be wondering, what on earth does C# exception handling have to do with D3D? In a COM based C++ API such as D3D, errors are reported via HRESULT return values, not exceptions, right?
Not so fast...
D3D actually has three different mechanisms for reporting errors, which exactly correlate to the first three of Eric's exception categories. There is no equivalent of vexing exceptions, because the D3D API does not use exceptions.

Exogenous errors In D3D

D3D reports exogenous errors by returning failure HRESULT codes. You should always check these return values and handle any failures.
Note that some D3D methods have a return type of void instead of HRESULT. This is because not all methods have any exogenous failure paths. For instance the ID3D11Device::CreateBlendState method returns an HRESULT (which you should check and handle) but ID3D11DeviceContext::OMSetBlendState returns void. This does not mean it is impossible for OMSetBlendState to fail, but that there is no point checking a return value because all the possible OMSetBlendState failures are boneheaded or fatal. You could pass a null pointer or try to set a state object that had previously been released, but as long as you specify valid parameters OMSetBlendState will always succeed.

Boneheaded errors in D3D

By default, D3D does not bother to check for boneheaded errors at all. This is because it takes some time to validate everything, and hey, this should be unnecessary as long as you don't have any bugs in your program, right? :-)
If you do make a boneheaded error calling into D3D, the result is undefined behavior. You might get lucky and the call happens to work like you expect in spite of the mistake, or it might work on one computer but not on others, or it might crash, or it might mess things up so some other call crashes later on, or you might just get weird rendering results. This is not good.
D3D provides an optional debug layer to help you find and fix such problems. This is enabled if you pass the D3D11_CREATE_DEVICE_DEBUG flag when creating your D3D device. It adds detailed parameter and state validation, which slows things down but will print nice messages back to the debugger output pane if it finds any boneheaded errors.
If you used the Windows 8 or Windows Phone 8 project templates as a starting point for your D3D app, the debug layer will be turned on by default in debug builds. The Direct3DBase::CreateDeviceResources method does this like so: UINT creationFlags = D3D11_CREATE_DEVICE_BGRA_SUPPORT;

#if defined(_DEBUG)
// If the project is in a debug build, enable debugging via SDK Layers with this flag.
creationFlags |= D3D11_CREATE_DEVICE_DEBUG;
#endif
It's always a good idea to test with the debug layer turned on, keep an eye on the output pane in Visual Studio, and fix any issues it complains about.

Fatal errors in D3D

Unexpected, non-recoverable D3D errors can (rarely) occur for several reasons:


  • The GPU hardware could hang due to a driver bug, hardware fault, overheating, etc. It must then be reset (basically rebooting the GPU) before rendering can continue. This is referred to as a TDR, which stands for Timeout Detection and Recovery.

  • The GPU could have been physically removed from the computer (which happens when undocking certain laptop models) or its driver could have been updated on the fly.

  • The driver could have a bug that got it into a messed up state.

  • The driver or D3D runtime could run out of memory in a place that isn't recoverable. Some allocation failures (for instance not having enough room to create a new texture) are easy to report back to the caller, but others (for instance failing to allocate an internal renamed resource while flushing a command buffer) could leave things in an ambiguous partial state where some amount of rendering might already have taken place yet other work cannot be completed. There is no good way to describe such a situation to the caller, and not really anything they could do to continue past the problem.

  • A boneheaded error could have gone unnoticed (especially if the debug D3D layer was not in use) so the D3D runtime passed invalid parameters through to the driver. What happens next is entirely up to the driver (this is undefined behavior!) but one likely outcome is for the driver to become horribly confused, fling its hands in the air, and shout huh? you want me to do whaaaat???

In all these cases things have gone badly wrong, and not in a clean way where D3D could report back that any single method call was rejected. We know the D3D device has ended up in a different state to what the program intended, but cannot describe exactly how things are different or what needs to happen to get everything back in sync. At this point D3D switches into a special "device removed" state. You can still call D3D methods, but they will be ignored and nothing will actually be rendered. When you eventually come to call Present, it will return one of the error codes:

  • DXGI_ERROR_DEVICE_REMOVED
  • DXGI_ERROR_DEVICE_RESET
  • DXGI_ERROR_DEVICE_HUNG
  • DXGI_ERROR_DRIVER_INTERNAL_ERROR

The only way to recover from device removed state is to destroy the broken D3D device, destroy all its associated resources, create a new device, and reload a whole new set of fresh resources. Some apps go to great pains to robustly handle this situation, but many (most?) games don't bother, and will just crash in response, or stop rendering and need to be killed by the user.
Note that D3D11 device removed state is different from the old D3D9 device lost behavior, which occurred any time the monitor switched between windowed and fullscreen mode (e.g. when pressing alt+tab). This was a common occurrence, so all D3D9 apps had to handle device lost. In D3D11, display mode switches are handled by the runtime without any special effort from the app. Device removed is reserved for situations where the state of the device is not well defined and thus no automatic recovery is possible, which is much rarer.
If you frequently run into device removed errors, the first step is to check the debug D3D layer to make sure these are not just knock-on consequences of an earlier boneheaded error.aggbug.aspx?PostID=10372419

Source
1 likes 0 comments

Comments

Nobody has left a comment. You can be the first!
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Advertisement