Optimizing strings and array allocations

When you create a string, the code allocates memory for its content, copies the content into that memory, and stores the address of this memory in the string variable. Of course, you all know that by now as this was the topic of the previous chapter.

If you append a character to this string, it must be stored somewhere in that memory. However, there is no place to store the string. The original memory block was just big enough to store the original content. The code must therefore enlarge that memory block, and only then can the appended character be stored in the newly acquired space.

As we'll see further on in the chapter, FastMM tries to make sure that the memory can be expanded in place, that is, without copying the original data into a new, larger block of memory. Still, this is not always possible and sometimes data must be copied around.

A very similar scenario plays out when you extend a dynamic array. Memory that contains the array data can sometimes be extended in place (without moving), but often this cannot be done.

If you do a lot of appending, these constant reallocations will start to slow down the code. The Reallocation demo shows a few examples of such behavior and possible workarounds.

The first example, activated by the Append String button, simply appends the '*' character to a string 10 million times. The code looks simple, but the s := s + '*' assignment hides a potentially slow string reallocation:

procedure TfrmReallocation.btnAppendStringClick(Sender: TObject);
var
s: String;
i: Integer;
begin
s := '';
for i := 1 to CNumChars do
s := s + '*';
end;

By now, you probably know that I don't like to present problems that I don't have solutions for and this is not an exception. In this case, the solution is called SetLength. This function sets a string to a specified size. You can make it shorter, or you can make it longer. You can even set it to the same length as before.

In case you are enlarging the string, you have to keep in mind that SetLength will allocate enough memory to store the new string, but it will not initialize it. In other words, the newly allocated string space will contain random data.

A click on the SetLength String button activates the optimized version of the string appending code. As we know that the resulting string will be CNumChars long, the code can call SetLength(s, CNumChars) to preallocate all the memory in one step. After that, we should not append characters to the string as that would add new characters at the end of the preallocated string. Rather, we have to store characters directly into the string by writing  to s[i]:

procedure TfrmReallocation.btnSetLengthClick(Sender: TObject);
var
s: String;
i: Integer;
begin
SetLength(s, CNumChars);
for i := 1 to CNumChars do
s[i] := '*';
end;

Comparing the speed shows that the second approach is significantly faster. It runs in 33 ms instead of the original 142 ms.

This code still calls some string management functions internally, namely UniqueString. It returns fairly quickly, but still represents some overhead. If you are pushing for the fastest possible execution, you can initialize the string by using a  sliding pointer, just as in the example from the previous chapter.

A similar situation happens when you are extending a dynamic array. The code triggered by the Append array button shows how an array may be extended by one element at a time in a loop. Admittedly, the code looks very weird as nobody in their right mind would write a loop like this. In reality, however, similar code would be split into multiple longer functions and may be hard to spot:

procedure TfrmReallocation.btnAppendArrayClick(Sender: TObject);
var
arr: TArray<char>;
i: Integer;
begin
SetLength(arr, 0);
for i := 1 to CNumChars do begin
SetLength(arr, Length(arr) + 1);
arr[High(arr)] := '*';
end;
end;

The solution is similar to the string case. We can preallocate the whole array by calling the SetLength function and then write the data into the array elements. We just have to keep in mind that the first array element always has index 0:

procedure TfrmReallocation.btnSetLengthArrayClick(Sender: TObject);
var
arr: TArray<char>;
i: Integer;
begin
SetLength(arr, CNumChars);
for i := 1 to CNumChars do
arr[i-1] := '*';
end;

Improvements in speed are similar to the string demo. The original code needs 230 ms to append ten million elements, while the improved code executes in 26 ms.

The third case when you may want to preallocate storage space is when you are appending to a list. As an example, I'll look into a TList<T> class. Internally, it stores the data in a TArray<T>, so it again suffers from constant memory reallocation when you are adding data to the list.

The short demo code appends 10 million elements to a list. As opposed to the previous array demo, this is a completely normal looking code, found many times in many applications:

procedure TfrmReallocation.btnAppendTListClick(Sender: TObject);
var
list: TList<Char>;
i: Integer;
begin
list := TList<Char>.Create;
try
for i := 1 to CNumChars do
list.Add('*');
finally
FreeAndNil(list);
end;
end;

To preallocate memory inside a list, you can set the Capacity property to an expected number of elements in the list. This doesn't prevent the list from growing at a later time; it just creates an initial estimate. You can also use Capacity to reduce memory space used for the list after deleting lots of elements from it.

The difference between a list and a string or an array is that, after setting Capacity, you still cannot access list[i] elements directly. Firstly you have to Add them, just as if Capacity was not assigned:

procedure TfrmReallocation.btnSetCapacityTListClick(Sender: TObject);
var
list: TList<Char>;
i: Integer;
begin
list := TList<Char>.Create;
try
list.Capacity := CNumChars;
for i := 1 to CNumChars do
list.Add('*');
finally
FreeAndNil(list);
end;
end;

Comparing the execution speed shows only a small improvement. The original code executed in 167 ms, while the new version needed 145 ms. The reason for that relatively small change is that TList<T> already manages its storage array. When it runs out of space, it will always at least double the previous size. Internal storage therefore grows from 1 to 2, 4, 8, 16, 32, 64, ... elements.

This can, however, waste a lot of memory. In our example, the final size of the internal array is 16,777,216 elements, which is about 60% elements too many. By setting the capacity to the exact required size, we have therefore saved 6,777,216 * SizeOf(Char) bytes or almost 13 megabytes.

Other data structures also support the Capacity property. We can find it in TList, TObjectList, TInterfaceList, TStrings, TStringList, TDictionary, TObjectDictionary and others.