A Quick Introduction to Sampler-Based Profiling

Published September 18, 2014
Advertisement

Sampler-Based Profiling: The Quick Version


So you're happily working on some code, and suddenly it happens: everything is just too damn slow! Something is eating up all your performance, but it's not immediately obvious what to do about it.

One of the first things any experienced programmer will tell you is to profile. In a nutshell, this is a grizzled beard-wearing programmer shorthand for measure things and be scientific about your approach to performance. Even some of the most brilliant programmers in the world find it hard to intuitively find the real performance bottlenecks in a complex piece of code. So don't rely on voodoo and ritualism to find your slow points - measure your code and act accordingly.

Picking a Profiler
There are fundamentally two approaches to profiling. One is instrumentation, and the other is sampling. Instrumentation means adding some magic to your code that times how long the program spends executing various functions. It can be as simple as subtracting a couple of timer values, or as complex as invasive changes to the entire program that do things like automatically record call stacks and such. The "industrial strength" profilers generally support this mode, although in practice I find it very hard to use for things like games, for one simple reason: it dramatically changes the behavior of your program when you want to do something in near real-time.

So I will suggest, if you're new to profiling, that you start with a sampling-based profiler. By now you have all the keywords you need to find one on the Internet, so I won't recommend anything in particular (people can be very dogmatic about what their favorite tools are).


Sampling and How it Works
The basic idea behind sampling profilers is simple statistics. First, start by taking a program that's already running. Freeze it in place, like hitting the Pause button, and note what code was running when you stopped the world. Optionally, you can record a callstack as well to get contextual information. Once the snapshot is taken, un-pause the program and let it continue running. Do this thousands of times, and you will - statistically speaking - slowly build up a picture of where and why your program is slow.

It is important to remember two things about sample-based profilers. First, they are statistical estimates only. They will give you slightly different results each time you run the program. Therefore, running as much code as possible is in your best interests - this gives you the opportunity to get lots of data and make a more accurate statistical picture. I usually run for something like 10,000 to 20,000 samples, to give you a ballpark idea.

Second, sampling will tend to downplay the importance of very tiny pieces of code. Statistically, it is easy to see why this has to be true: since the piece of code is tiny, the odds of freezing the program exactly in that spot are correspondingly slim. You might land just before it, or just after it, for example. If your program is largely built up of equally slow tiny bits of code, sampling might make it impossible to find a bottleneck.

That said, sampling is still a great tool, and an easy way to get started with profiling. So let's talk about how to use it.


Using Sampling
Typically, profilers will show you two crucial stats about a given piece of code: how often it was seen in the freeze-frame snapshots (sample count), and a rough guess at how much time was spent in that chunk of code across the entire profiling session (aggregate time). Some profilers distinguish between time spent in just the code (exclusive time) versus time spent in the code and everything that code calls (inclusive time). These are useful measurements in various situations so don't get too used to focusing on a single statistic.

For introductory purposes, often the easiest way to fix performance issues is to look at the combination of inclusive time and sample count. This tells you (roughly) what fraction of a program's life is spent chugging through a given piece of code. Any decent profiler will have a sorted breakdown that lets you view your worst offenders in some easy format.

Once you have identified a slow piece of code, it takes some analysis of the program itself to understand why it is slow. Sometimes, you might find a piece of code can't be optimized, but chews up a lot of profiling samples - this is common with high-throughput inner loops, for instance. So don't be afraid to use different stats (especially exclusive time) and look further down the list for potential wins.


Pitfalls and Gotchas
There's a few things worth mentioning yet; I'll only hit them at a high level, though, because the details will vary a lot based on OS, compiler, and so on.

  • Memory allocations can hide in weird places. Look for mysterious samples in system code, for example. Learn to recognize the signs of memory allocation and deallocation - even if you're in a garbage-collected language, these can be important.
  • Multithreading is the enemy of sample-based profiling, because blocking a thread looks an awful lot like chewing up CPU on the thread. Note that some good profilers can tell the difference, which is nice.
  • Statistics are lies. If your profiler is telling you something profoundly confusing, seek advice on whether you're missing something in the code, or if the profiler is just giving you an incomplete statistical picture of reality.
  • Practice is important, but so is realistic work. Why didn't I include an example program and profiling screenshots? Because real programs are much harder to profile than simple examples. If you want practice, work on real programs whenever possible, because you'll learn a lot more.
Enjoy!
13 likes 3 comments

Comments

Washu

Practice is important, but so is realistic work. Why didn't I include an example program and profiling screenshots? Because real programs are much harder to profile than simple examples. If you want practice, work on real programs whenever possible, because you'll learn a lot more.

This.

THIS.

Cannot stress this enough: THIS

September 18, 2014 12:39 AM
KATAPLEXIA

Statistics are lies.

For me, this was the most impacting sentence. I cannot even begin to count how many times my profiling has revealed strange allocations that have sent me off on a hunt - only to find I have been looking in the wrong places.

I agree. It is only through practice that one can learn to find bottlenecks quickly digesting a block of complex code alongside the use of profiling.

September 21, 2014 06:13 AM
christma

[A profiler] times how long the program spends executing various functions.

One important aspect to consider when profiling a multithreaded program running on a multicore cpu:

If a section of your code (i.e. a loop) is running on multiple cores concurrently then a sampling profiler might identify this code section as a hot spot - even though the execution time for this section might actually be low (compared to other sections of the code).

The reason is that the sampling counts a code section which is executed concurrently at the time of the sampling multiple times. Hence, in that case the sampling actually measures the work done in different code sections of the program and not the execution time.

As far as I know, to find hotspots of a parallel program running on a multicore cpu other forms of profiling must be used: Instead of deriving the execution time on the basis of sampling the actual execution time of the different code sections have to be measured directly.

September 23, 2014 09:26 AM
You must log in to join the conversation.
Don't have a GameDev.net account? Sign up!
Profile
Author
Advertisement
Advertisement