- Mastering Delphi Programming:A Complete Reference Guide
- Primo? Gabrijel?i?
- 919字
- 2021-06-24 12:33:34
Record field alignment
The third compiler option I'd like to discuss regulates the alignment of fields in Delphi record and class types. It can be set to the following values: Off, Byte, Word, Double Word, and Quad Word. Settings are a bit misleading, as the first two values actually result in the same behavior.
You can use compiler directives {$ALIGN 1} , {$ALIGN 2}, {$ALIGN 4}, and {$ALIGN 8} to change record field alignment in code, or equivalent short forms {$A1}, {$A2}, {$A4}, and {$A8}. There are also two directives which exist only for backward compatibility. {$A+} means the same as {$A8} (which is also a default for new programs) and {$A-} is the same as {$A1}.
Field alignment controls exactly how fields in records and classes are laid out in memory.
Let's say that we have the following record. And let's say that the address of the first field in the record is simply 0:
type
TRecord = record
Field1: byte;
Field2: int64;
Field3: word;
Field4: double;
end;
With the {$A1} alignment, each field will simply follow the next one. In other words, Field2 will start at address 1, Field3 at 9, and Field4 at 11. As the size of double is 8 (as we'll see later in this chapter), the total size of the record is 19 bytes.
With the {$A2} alignment, each field will start on a word boundary. In layman's terms, the address of the field (offset from the start of the record) will be divisible by 2. Field2 will start at address 2, Field3 at 10, and Field4 at 12. The total size of the record will be 20 bytes.
With the {$A4} alignment, each field will start on a double word boundary so its address will be divisible by 4. (You can probably see where this is going.) Field2 will start at address 4, Field3 at 12, and Field4 at 16. The total size of the record will be 24 bytes.
Finally, with the {$A8} alignment, each field will start on a quad word boundary so its address will be divisible by 8. Field2 will start at address 8, Field3 at 16, and Field4 at 24. The total size of the record will be 32 bytes.
Saying all that, I have to add that $A directive doesn't function exactly as I described it. Delphi knows how simple data types should be aligned (for example, it knows that an integer should be aligned on a double word boundary) and will not move them to higher alignment, even if it is explicitly specified by a directive. For example, the following record will use only 8 bytes even though we explicitly stated that fields should be quad word aligned:
{$A8}
TIntegerPair = record
a: integer;
b: integer:
end;
If you need to exactly specify size and alignment of all fields (for example if you pass records to some API call), it is best to use the packed record directive and insert unused padding fields into the definition. The next example specifies a record containing two quad word aligned integers:
TIntegerPair = packed record
a: integer;
filler: integer;
b: integer:
end;
The following image shows how this record is laid out in memory with different record field alignment settings. Fields are renamed F1 to F4 so that their names would fit in the available space. X marks unused memory:
Why is all this useful? Why don't we always just pack fields together so that the total size of a record or class is as small as possible? Well, that is an excellent question!
As traditional wisdom says, CPUs work faster when the data is correctly aligned. Accessing a four-byte data (an integer, for example) is faster if its address is double word aligned (is divisible by four). Similarly, two-byte data (word) should be word aligned (address divisible by two) and eight-byte data (int64) should be quad word aligned (address divisible by eight). This will significantly improve performance in your program.
Will it really? Does this traditional wisdom make any sense in the modern world?
The CompilerOptions demo contains sets of measurements done on differently aligned records. It is triggered with the Record field align button.
Running the test shows something surprising—all four tests (for A1, A2, A4, and A8) run at almost the same speed. Actually, the code operating on the best-aligned record (A8) is the slowest! I must admit that I didn't expect this while preparing the test.
A little detective work has shown that somewhere around year 2010, Intel did a great job optimizing the data access on its CPUs. If you manage to find an older machine, it will show a big difference between unaligned and aligned data. However, all Intel CPUs produced after that time will run on unaligned and aligned data at the same speed. Working on unaligned (packed) data may actually be faster as more data will fit into the processor cache.
What is the moral lesson of all that? Guessing is nothing, hard numbers are all! Always measure. There is no guarantee that your changes will actually speed up the program.