In Part I of this blog post we introduced the concept of virtual class variables - a feature currently (Delphi 2007) lacking from Object Pascal (and most other languages). We also covered a potential syntax and suggested some compiler implementation details. In this post we will continue by looking at some hacks to try and implement the functionality of virtual class fields manually by using some clever tricks and hacks. The original idea is Patrick van Logchem's.
Hacking a solution
While we wait for CodeGear to eventually implement (or not) support for these virtual class fields, what should we do? This is where my little email conversation with Patrick van Logchem of everyangle.com comes into play. From his original email to me:
[...] Anyway, I discovered Delphi got class variables since version 2005, but I needed something a bit more specific: Class-specific variables. This is not a standard language construct, because a class var is just another type of global; not much use in my case - I want this:
TClass1 = class(TObject)
public
class property Variable: Type;
end;
TClass2 = class(TClass1);
and then TClass1.Variable <> TClass2.Variable. In words: when declaring a variable of this kind, the class itself and all its derived classes should have their own version of this variable.
This matches exactly the virtual class vars we have been discussing in Part I. Not content with the missing language support, Patrick did what any true hacker would have done - he devised his own solution. Patrick continues:
I haven't found a clean language-construct for such a simple requirement, so I started hacking. To make this work, I (ab-)use a slot in the VMT as a variable! Here a slightly edited cut-'n-paste of our production code:
type
PClass = ^TClass;
TClassInfo = class(TObject);
TBasicObject = class(TObject)
strict private
procedure VMT_Placeholder1; virtual;
protected
class procedure SetClassInfo(const aClassInfo: TClassInfo);
public
class procedure InitVMTPlaceholders; virtual;
function GetClassInfo: TClassInfo; inline;
class function ClassGetClassInfo: TClassInfo; inline;
end;
PBasicObjectOverlay = ^RBasicObjectOverlay;
RBasicObjectOverlay = packed record
OurClassInfo: TClassInfo;
end;
procedure PatchCodeDWORD(Code: PDWORD; Value: DWORD);
var
RestoreProtection, Ignore: DWORD;
begin
if VirtualProtect(Code, SizeOf(Code^), PAGE_EXECUTE_READWRITE,
RestoreProtection) then
begin
Code^ := Value;
VirtualProtect(Code, SizeOf(Code^), RestoreProtection, Ignore);
FlushInstructionCache(GetCurrentProcess, Code, SizeOf(Code^));
end;
end;
class procedure TBasicObject.InitVMTPlaceholders;
begin
if Pointer(ClassGetClassInfo) = Addr(TBasicObject.VMT_Placeholder1) then
begin
PatchCodeDWORD(@PBasicObjectOverlay(Self).OurClassInfo, DWORD(nil));
Assert(ClassGetClassInfo = nil, 'Failed cleaning VMT of ' + ClassName);
end
else
Assert(ClassGetClassInfo = nil,
'Illegal value when checking initialized VMT of ' + ClassName);
end;
function TBasicObject.GetClassInfo: TClassInfo;
begin
Result := PBasicObjectOverlay(PClass(Self)^).OurClassInfo;
end;
class function TBasicObject.ClassGetClassInfo: TClassInfo;
begin
Result := PBasicObjectOverlay(Self).OurClassInfo;
end;
class procedure TBasicObject.SetClassInfo(const aClassInfo: TClassInfo);
begin
PatchCodeDWORD(@PBasicObjectOverlay(Self).OurClassInfo, DWORD(aClassInfo));
end;
procedure TBasicObject.VMT_Placeholder1;
begin
Assert(False);
VMT_Placeholder1;
end;
initialization
TBasicObject.InitVMTPlaceholders;
end.
The nicest thing about this solution is, that an inlined call to GetClassInfo results in only 2 opcodes: MOV EAX, [EAX]
MOV EAX, [EAX+12]
You can't get it any faster than that!
Yes, that does look like impressively fast code!
Analyzing the Hack
Lets pause a little and analyze exactly what Patrick's hack is doing. The first thing to note is that he introduces a base class, TBasicObject, that all other classes that wants the per-class class storage should inherit from (directly or indirectly). The base class then does something peculiar - it declares a strict private virtual method (called VMT_Placeholder1) that can never be overridden. This is because it is never meant to be overridden - in fact it is not even intended to ever be called - it is only there to take up place and reserve a slot in the class' (and all derived classes') VMT (virtual method table - see here and here for details).
Reserving space in the VMT
Why would he want to waste space in the VMT? To reserve space that can be used to store per-class data, of course! The whole point of this exercise is to have the instance function GetClassInfo (and the corresponding class function ClassGetClassInfo) return an instance of a user defined class TClassInfo that contains per-class meta-data (class attributes á la .NET, if you like) useful to the programmer. Lets look closer at the implementation of this function.
function TBasicObject.GetClassInfo: TClassInfo;
begin
Result := PBasicObjectOverlay(PClass(Self)^).OurClassInfo;
end;
There is some funky looking type casts going on here. This is an instance function so the implicit Self parameter represents the TObject (or in this case, TBasicObject) instance that the method is being called on. As we already know, the first 4 bytes of the instance memory block contains a TClass - which is implemented as a pointer to the VMT of the class. The PClass(Self)^ cast first dereferences the instance pointer and picks up a copy of the VMT pointer. The VMT contains an array of the normal user-defined virtual methods of the class (at negative offsets we find the special TObject virtual methods and the magic VMT fields - details here).
Casting Magic
A TClass reference is opaque in the sense that you cannot explicitly dereference it in code - however the compiler does it all the time when you are calling virtual methods or accessing members such as ClassName. The code above takes the TClass value and casts it into a RBasicObjectOverlay record pointer. This record contains a single 4-byte field, OurClassInfo, that has the same type as the meta class object we want to access, TClassInfo. Since the VMT_Placeholder1 method is the first virtual method in TBasicObject, and since TBasicObject inherits from TObject (that has no "normal" (i.e. positive VMT offset) virtual methods), the OurClassInfo field access above just happens to match the VMT slot for VMT_Placeholder1. Got that?
Doing the compiler's work
The trouble is, of course, is that the VMT_Placeholder1 VMT slot does not contain the reference of a TClassInfo instance at all. Instead it contains the address of the virtual method implementation code (that will always be equal to @TBasicObject. VMT_Placeholder1 - being strict private, it cannot be overridden, remember?). So we will have to perform a little VMT patching again :-). (I told you this was a hack, right?). We'll divide this task in two parts - from the initialization section of all units that declare one or more TBasicObject descendants should be the code to clear the VMT slot so that it is ready for our purposes.
class procedure TBasicObject.InitVMTPlaceholders;
begin
if Pointer(ClassGetClassInfo) = Addr(TBasicObject.VMT_Placeholder1) then
begin
PatchCodeDWORD(@PBasicObjectOverlay(Self).OurClassInfo, DWORD(nil));
Assert(ClassGetClassInfo = nil, 'Failed cleaning VMT of ' + ClassName);
end
else
Assert(ClassGetClassInfo = nil,
'Illegal value when checking initialized VMT of ' + ClassName);
end;
initialization
TBasicObject.InitVMTPlaceholders;
end.
First there is some sanity checks, using Asserts, ensuring that the compiler generated value of the VMT slot we're going to patch matches our expectations. If the slot does not contain the static code address of the TBasicObject.VMT_Placeholder1 method, the method has either been overridden, not been compiled into a virtual method, or has received a different slot than we anticipated. Better safe than sorry.
Then we use the PatchCodeDWORD utility routine to do the actual dirty work of patching the VMT slot with a nil value (effectively clearing it). Again we check that the patching went well, raising an Assert exception if it didn't.
Creating a metainfo class
Ok, that's step one. The nil value is in fact assignment compatible as a TClassInfo reference, but you cannot store much data in a nil pointer ;). The next step is to actually create a TClassInfo instance and assign it to the now per-class variable slot we have made available in the VMT. This should only be done once per class - it can be done in the initialization section of the unit, or it could be done by some other startup code in the project. The assignment is done by calling the SetClassInfo class method. Here is a simple example where we have extended the application specific TClassInfo with a single integer field and a constructor to initialize it.
type
TClassInfo = class(TObject)
public
A: integer;
constructor Create(Value: integer);
end;
constructor TClassInfo.Create(Value: integer);
begin
inherited Create;
A := Value;
end;
initialization
TBasicObject.InitVMTPlaceholders;
TBasicObject.SetClassInfo(TClassInfo.Create(42));
Having looked at both the GetClassInfo function and the InitVMTPlaceholders method above, the implementation of SetClassInfo should not be surprising.
class procedure TBasicObject.SetClassInfo(const aClassInfo: TClassInfo);
begin
PatchCodeDWORD(@PBasicObjectOverlay(Self).OurClassInfo, DWORD(aClassInfo));
end;
This code patches the right VMT slot in the code segment with the instance reference of our per-class meta-data instance, TClassInfo. This should only be done once. After this the class-specific TClassInfo can be retrieved using the GetClassInfo function - and we can freely read and write the TClassInfo fields and properties - without any fear of triggering access violations. The TClassInfo instance lives in the dynamic heap, just like any other object instance.
Application level classes
Writing additional classes that supports these per-class TClassInfo variables is easy. Just derive the class from TBasicObject, call InitVMTPlaceholders for the class and assign a new TClassInfo instance using SetClassInfo. Lets rewrite the Apples & Oranges sample from Part I using this new hacking technique.
type
TFruitClassInfo = class(TClassInfo)
private
var FInstanceCount: integer;
end;
TFruit = class(TBasicObject)
protected
class function FruitClassInfo: TFruitClassInfo; inline;
public
constructor Create;
class function InstanceCount: integer;
end;
TApple = class(TFruit)
end;
TOrange = class(TFruit)
end;
constructor TFruit.Create;
begin
inherited Create;
Inc(FruitClassInfo.FInstanceCount);
end;
class function TFruit.FruitClassInfo: TFruitClassInfo;
begin
Result := ClassGetClassInfo as TFruitClassInfo;
end;
class function TFruit.InstanceCount: integer;
begin
Result := FruitClassInfo.FInstanceCount;
end;
initialization
TFruit.SetClassInfo(TFruitClassInfo.Create);
TApple.SetClassInfo(TFruitClassInfo.Create);
TOrange.SetClassInfo(TFruitClassInfo.Create);
end.
First notice that the code is much simpler now. The InstanceCount function is introduced and fully implemented by the TFruit class - the TApple and TOrange classes do no longer have to help implement it. Because the compiler does not support per-class variables, we see the presence of the hacking code in the initialization section. Note that I was lazy, skipping the checking and overwriting the VMT slot with nil (by calling InitVMTPlaceholders on each class). I like to live dangerously ;)).
We introduce a class that inherits from the generic TClassInfo and adds the variable we need for storage. To get type safe access to this TFruitClassInfo instance, I've also written a class function (FruitClassInfo) that returns it - performing an as-cast in the process.
ClassInfo Design
Depending on your application requirements and homogeneousness of your application classes, you might want to stick to a single TClassInfo class that contains all the fields and properties you need for all classes, or create specific TClassInfo descendents for some classes. Using a single shared class can produce faster code, because you don't need to do the type cast (you could "cheat" by using a faster hard-cast instead of the as-cast).
Inlining gotchas
In addition, the current inlining capability of the compiler seems to prevent class methods from being inlined. That's why you should call the instance method GetClassInfo from time-critical code - this assumes you have a live instance (rather than a static or dynamic class reference) to call it on. As you may be able to read below the striked-out font, I was incorrectly generalizing from one bad sample. In Patrick's code above the TBasicObject.InitVMTPlaceholders calls the inlined class function ClassGetClassInfo, and if you look closely at the generated assembly code, you'll find that the call is not inlined. After I while I spotted the reason; method implementation order. The implementation of an inlined routine must have been "seen" by the compiler before a call to it - otherwise the compiler will not be able to inline it. With the Delphi compiler explicitly (and deliberately) designed to be a single-pass compiler, this is only natural. The compiler cannot output code it hasn't seen yet. This might be biting other people too, so I've updated by inlining post here. If you move the InitVMTPlaceholders below the ClassGetClassInfo, the call will be inlined. Nice to get rid of that little misunderstanding ;).
The performance angle
As Patrick noted in his email, the combination of inlining, the GetClassInfo instance method and the ingenious, but hacky, casting allows the compiler to produce very efficient code when accessing the TClassInfo per-class metadata on an object instance.
ClassInfo := Apple.GetClassInfo;
asm
mov eax,[eax]
mov eax,[eax]
end;
To go from an object instance to the object's class' metainfo TClassInfo only takes two machine code instructions and two memory accesses. The first converts from TObject to TClass, the second picks up the contents of the first VMT slot (i.e. index and offset 0). It doesn't get any faster than this - very impressive! ;)
A cleaner Hack?
I think Patrick knew he had a great hack up his sleeve, but at the same time something was bothering him. Could it be done differently, better or cleaner? It most probably couldn't be made faster. Quoting Patrick again:
But, this functionality shouldn't be so dirty as this to implement - do you know of a cleaner solution than this?
Well, it is possible to write a cleaner solution, but it would probably end up being slower. One way is to use a hash table with the TClass reference as the key - looking up the TClassInfo instance that corresponds to a specific class.
Depending on your point of view you could make it more or less dirty by not using a new VMT slot for this, but instead overwrite and reuse one of the unused magic VMT slots, like the one for automated methods, AutoTable, a relic from Delphi 2 that is generally not used anymore. Here is the VMT pseudo record layout taken from this post.
type
PVmt = ^TVmt;
TVmt = packed record
SelfPtr : TClass;
IntfTable : Pointer;
AutoTable : Pointer;
InitTable : Pointer;
TypeInfo : Pointer;
FieldTable : Pointer;
MethodTable : Pointer;
DynamicTable : Pointer;
ClassName : PShortString;
InstanceSize : PLongint;
Parent : PClass;
SafeCallException : PSafeCallException;
AfterConstruction : PAfterConstruction;
BeforeDestruction : PBeforeDestruction;
Dispatch : PDispatch;
DefaultHandler : PDefaultHandler;
NewInstance : PNewInstance;
FreeInstance : PFreeInstance;
Destroy : PDestroy;
end;
The advantage of re-using the AutoTable instead is:
- You don't need the magic virtual method anymore
- It is already initialized to nil (except for legacy Delphi 2 code)
- You can thus use this trick for any class, not just those derived from your TBasicObject
This main disadvantage is that we're using a VMT slot that could conceivably be used, even though the automated section has been deprecated since Delphi 3.
Nostalgia: Delphi 2, COM and automated
It does have the drawback of not being compatible with old Delphi 2 code that uses the automated section. Back in those days, Delphi didn't support COM compatible interfaces - so it had to implement COM objects using abstract classes with virtual methods that just happened to match COMs requirements. Since Delphi only supported single-inheritance of classes, only a single "interface" could be implemented by a class. If you wanted an object to support multiple COM interfaces, each interface had to be implemented by a separate Delphi class, and you had to manually write QueryInterface methods that would marshall properly between the "interface" (read class) implementations.
The automated section was needed to get late-bound Automation support. The compiler generates special RTTI for automated sections so that the Delphi 2 COM support code could translate method and property name strings into callable entities. In Delphi 3, the COM support was substantially improved with proper support for interfaces and automation support with dual-interfaces (dispinterface). In short, automation section in classes should (hopefully) be rare about now. It may be used by clever code just to get at the automated RTTI for other purposes (a custom script language implementation, for example).
If you need access to full RTTI for (public or published) methods and properties, I would recommend using the more complete and better documented $METHODINFO ON feature instead. Se my posts about that feature here, here, here and here.
A "cleaner" hack - overwriting AutoTable
Changing Patrick's original hack to overwrite the AutoTable in the existing VMT instead of overwriting the VMT slot of a new virtual method, simplifies the code considerably.
type
PClassVars = ^TClassVars;
TClassVars = class(TObject)
public
InstanceCount: integer;
end;
TBasicObject = class(TObject)
protected
class procedure SetClassVars(aClassVars: TClassVars);
public
class function GetClassVars: TClassVars; inline;
function ClassVars: TClassVars; inline;
end;
const
vmtClassVars = System.vmtAutoTable;
function TBasicObject.ClassVars: TClassVars;
begin
Result := PClassVars(PInteger(Self)^ + vmtClassVars)^;
end;
class function TBasicObject.GetClassVars: TClassVars;
begin
Result := PClassVars(Integer(Self) + vmtClassVars)^;
end;
class procedure TBasicObject.SetClassVars(aClassVars: TClassVars);
begin
PatchCodeDWORD(PDWORD(Integer(Self) + vmtClassVars), DWORD(aClassVars));
end;
We no longer need the artificially created strict private virtual method, nor the method to clear the VMT slot (as we assume that the AutoTable slot is already free). Notice that we use one of the magic constants from the System unit to determine the offset that we will use, vmtAutoTable. From the System unit:
const
...
vmtSelfPtr = -76;
vmtIntfTable = -72;
vmtAutoTable = -68;
vmtInitTable = -64;
Here you see that the AutoTable is at negative offset -68 (or -$44 in hex) from the base TClass pointer. I've also chosen to rename ClassInfo to ClassVars, to reduce confusion with the existing TObject.ClassInfo that returns a pointer to the RTTI of the published properties in the class (and used by the TypInfo unit). The SetClassInfo and ClassInfo methods are non-static class methods (so that they receive the implicit Self: TClass parameter that contains the runtime class reference), while the instance function ClassVar returns the TClassVars instance that holds the per-class variables of the class of the object instance.
To keep the code simple and fast, I've selected to put the InstanceCount field directly in the TClassVars class (instead of creating a descendent class). To add a certain shim of abstraction to the initialization of the ClassVars slot, I've thrown in a simple registration procedure as well.
procedure RegisterClassVarsSupport(const Classes: array of TBasicObjectClass);
var
LClass: TBasicObjectClass;
begin
for LClass in Classes do
if LClass.GetClassVars = nil then
LClass.SetClassVars(TClassVars.Create)
else
raise Exception.CreateFmt(
'Class %s has automated section or duplicated registration', [LClass.ClassName]);
end;
Our fruit example then becomes simpler too.
type
TFruit = class(TBasicObject)
public
constructor Create;
function InstanceCount: integer; inline;
class function ClassInstanceCount: integer; inline;
end;
TApple = class(TFruit)
end;
TOrange = class(TFruit)
end;
constructor TFruit.Create;
begin
inherited Create;
Inc(ClassVars.InstanceCount);
end;
function TFruit.InstanceCount: integer;
begin
Result := ClassVars.InstanceCount;
end;
class function TFruit.ClassInstanceCount: integer;
begin
Result := GetClassVars.InstanceCount;
end;
initialization
RegisterClassVarsSupport([TFruit, TApple, TOrange]);
end.
The test code stays the same as before. A quick look at the generated machine code for going from an object instance, via the object's TClass reference to the TClassVar slot in the VMT and finally referencing an integer field (InstanceCount) is impressive.
Count := Apple.ClassInstanceCount;
asm
mov eax,[esi]
add eax,-$44
mov eax,[eax]
mov ebx,[eax+$04]
end;
Only four instructions. Notice that this is one instruction more than the virtual method slot hack we started with. The main reason for this is that the vmtAutoTable is at a negative offset in the VMT, while the user-defined virtual method is at a positive offset. Currently, it does not seem to be a way to force the compiler to put the constant offset calculation inside the memory referencing mov reg, mem opcode - for negative offsets. The ideal would be if the compiler could generate the following machine code instead.
Count := Apple.ClassInstanceCount;
asm
mov eax,[esi]
mov eax,[eax-$44]
mov ebx,[eax+$04]
end;
Here the subtraction of the $44 constant has been put into the opcode itself and this is smaller and faster than explicitly modifying the register. We may be looking into this issue in a later post.
Also notice that while all class VMTs have AutoTable fields, we are still using a TBasicObject class that the TFruit class is inheriting from. We may be looking at different ways of (trying to) overcome this restriction later. The blog post is running long enough already - most of you must be sleeping by now - and besides, Windows Live Writer will not let me edit in HTML mode any more (for inserting the HTML code snippets that Delphi2HTML generates for me) - seems like it has an incredibly silly 32 KB limit on editing HTML (hey, what is this, 1982??!). OTHO, it saved you from an even longer post ;)).
I'll just give Patrick the word again here at the end of the article:
I've now switched over to using vmtAutoTable completely, as you suggested. I've already applied it to all the classes we had older hacks for, and it did wonders for the speed of our query-engine, so thanks!
Well, thanks to you Patrick for sharing your ideas and hack with us! It's very interesting, but is only a small part of the puzzle. From our small email conversation, it is clear that Patrick and his colleagues in Every Angle are really a smart bunch. They have developed a very impressing architecture specialized for their extreme requirements. You can check the high-level descriptions of their SAP database analytics and data mining products here.
On a more technical level it suffices to say that they use custom and extremely compact and fast data structures, tricks and hacks to be able to represent millions and millions of objects within the constraints of a 32-bit Windows system. Throw in the use of Physical Address Extensions, storing per-class information in "virtual" class vars to reduce object instance size, creation of classes and their VMTs dynamically at runtime (!!), pointer packing, multithreading, the list just goes on and on.
Maybe we can convince Patrick to start a blog of his own, to share some of his ideas and techniques - alas, much of it may be company confidential.