AsmProfiler

AsmProfiler is a 32-bit instrumenting and sampling profiler written by André Mussche. Its source, along with Windows exe, can be found at https://github.com/andremussche/asmprofiler. Although the latest version was released in 2016, it works well with the newest Delphi at the time of writing this book, 10.2 Tokyo.

The sampling and instrumenting profilers are used in a different way, so I'll cover them separately.

To use AsmProfiler you first have to unpack the release ZIP into some folder on your disk. That will create two subfolders—Sampling and Instrumenting. To start the sampling profiler, start AsmProfiling_Sampling from the Sampling folder and then click Start profiling.

AsmProfiler has two ways of starting a profiling session. You can start the program manually, click Select process in AsmProfiler, and select your program from the list of running processes. Alternatively, you can click Select exe, browse to the compiled EXE of your program, and then click Start process now, which will start the program or click Start sampling which will start the program and also start sampling the code.

The reasoning behind the two different ways of starting the program is that mostly you want to profile a specific part of the program, so you would load it into the profiler, navigate to the specific part, click Start sampling, do the required steps in your program, click Stop sampling, and analyze the results. Sometimes, however, you would want to profile the very startup of the program so you would want sampling to start immediately after your program is launched. The following screenshot shows the initial AsmProfiler screen:

Starting profiling session in AsmProfiler

Sampling options can be configured in the Options group. You can set the sampling interval—how many milliseconds will pass between two samples. Setting this value to 4, for example, will generate 250 (1000/4) samples per second.

Setting the sampling interval to 0 enables a continuous sampling mode which still leaves some time to threads running on the same CPU. (The sampling code calls Sleep(0) between taking two samples.) Setting it to -1 causes the samples to be taken as fast as possible. In effect, AsmProfiler will use one CPU core for itself.

You can also set the priority of sample-taking threads from the default,  Normal. I will discuss threads and priorities in Chapter 5, Getting Started with the Parallel World.

After you have taken the samples and clicked Stop sampling, click the Show results button to analyze the results.

If you want to compare your results with mine, here are steps to produce the results (as shown in the following screenshot):

  1. Start AsmProfiler.
  2. Click Select exe and select the SlowCode.exe.
  3. Click Start process now.
  4. Click Start sampling.
  5. Enter 100,000 into the program.
  6. Click Stop sampling.
  7. Enter 0 into the program to exit.
  8. Click Show results.
  1. Click Results on the Sampling Results form.
Result of a test run profiled with AsmProfiler's Sampling profiler

AsmProfiler displays results organized by threads, but it automatically highlights the main thread so you don't have to care about that detail if your program doesn't use multithreading.

In the result grid, it shows the module (main EXE or DLL), name of the function, how many times the code was found to be in this function, and times (absolute and percentage) the code spent in that function (Own), in all functions called from it (Child), and both in that function and in all functions called from it (Own+Child).

If we sort results by the time spent only in the function (Own time), we'll see that function TextIn$qqrr15System.TTextRec comes to the top. This a function that reads console input (in the Readln statement) and we can safely ignore it. 

The next one on the list, ElementInDataDivides$qqrp40..., is the one that interests us. We can see that it was sampled 371 times (Calls) and that it needed 0.6 seconds to execute. If you switch to the Detailed view tab and select this function, you'll see in the Parent calls (called by...) panel that it was called 258 times from SlowMethod and 113 times from the Filter method. In reality, of course, it was called many more times, but most of them were not seen by the sampling profiler.

In this view, we can also see how many times each line of some method was hit, which will give us a good idea about where the program spends most of the time. Unfortunately, we cannot sort the data on different criteria:

Detailed view showing information for each line of the program

The names of the methods in these outputs are very weird, but that is how the Delphi compiler calls these methods internally. The part after the $ encodes the parameter types. This process is called name mangling (and the part after $ is sometimes referred to as a decoration) and it enables us to use overloaded methods with the same name and different parameters—internally they all have different names.

For example, the function SlowMethod(highBound: Integer) is internally known as SlowMethod$qqri. The qqr part specifies the fastcall calling convention (it describes how parameters are passed to the function in registers and on stack) and i identifies one integer parameter.

AsmProfiler's instrumenting profiler requires a bit more work. Firstly, you have to copy AsmProfiler.dll from the Instrumenting subfolder into a folder on the Windows environment path or into the exe folder. Secondly, you have to copy Instrumenting\API\_uAsmProfDllLoader.pas into Delphi's library path, into your project's folder or add this folder to your project's search path.

Thirdly, you have to add the _uAsmProfDllLoader unit to your project and call the following code. This will show the profiler's main form on the screen:

if _uAsmProfDllLoader.LoadProfilerDll then
_uAsmProfDllLoader.ShowProfileForm;

To start profiling, call _uAsmProfDllLoader.StartProfiler(False);. To stop collecting data, call  _uAsmProfDllLoader.StopProfiler;. These two calls are not strictly necessary. You can also start and stop profiling from the profiler's user interface. Modifying the code will, however, give you more control over the profiling process.

Before your program exits, unload the profiler DLL with _uAsmProfDllLoader.UnLoadProfilerDll;.

Make sure that your program has the following compiler options correctly set:

  • Linker, Map file = detailed
  • Compiler, Optimization = off
  • Compiler, Stack frames = on

The instrumenting profiler requires your program to process messages which makes it mostly useless when used in Mr. Smith's program, as it is spending most of the time inside a Readln call (and is not processing messages). As I still wanted to show you how this profiler works, I have converted SlowCode into a more modern VCL version, SlowCode_VCL.

At first I wanted to start/stop profiling right in SlowMethod:

function TfrmSlowCode.SlowMethod(highBound: Integer): TArray<Integer>;
var
i: Integer;
temp: TList<Integer>;
begin
_uAsmProfDllLoader.StartProfiler(False);

// existing code

_uAsmProfDllLoader.StopProfiler;
end;

That attempt, however, misfired, as AsmProfiler didn't want to show profiling results for SlowCode. It turned out to be better to move the start/stop calls out of this method and into the method which calls SlowCode:

procedure TfrmSlowCode.btnTestClick(Sender: TObject);
var
data: TArray<Integer>;
begin
outResults.Text := '';
outResults.Update;

_uAsmProfDllLoader.StartProfiler(False);
data := SlowMethod(inpHowMany.Value);
_uAsmProfDllLoader.StopProfiler;

ShowElements(data);
end;

A version of the program, ready for profiling with the AsmProfiler, is stored in the SlowCode_VCL_Instrumented project. You will still have to download AsmProfiler and store AsmProfiler.dll and _uAsmProfDllLoader.pas into appropriate places.

When you start the program, a small form will appear alongside the program's main form. From here you can start and stop profiling, select items (methods) that should be profiled (Select items), and open results of the profiling session (Show results):

AsmProfiler's instrumenting profiler

We are interested only in three methods, so click the Select items button, select the ElementInDataDivides, Filter and SlowMethod methods, and click OK:

Selecting methods to be profiled

Next, enter 100000 into the How many numbers field and click the Test button. You don't have to start and stop the profiler, as the program will do that. When the values are calculated and displayed on the screen, click the Show results button. Don't close the profiled program as that would close the profiler form, too.

The result form of the instrumenting profiler is very similar to the equivalent form of the sampling profiler. The most interesting feature is the Unit overview tab, which combines detailed timing information and a call tree:

Unit overview display

We can see that ElementInDataDivides is in fact called 99,999 times directly from the SlowMethod and only 9,592 times from the Filter method, not 258 and 113 times, as shown by the sampling profiler.

AsmProfiler gives a good combination of a global overview and detailed analysis, although it is rough around the edges and requires more effort on your part than more polished commercial profilers.