Friday, May 04, 2007

Hack#17: Virtual class variables, Part I

[Note: This blog post was inspired by an email conversation I had with Patrick van Logchem - more details on this in Part II]

Proper Object Pascal support for class var variables was first introduced in Delphi 8 for .NET and later in Delphi 2005 for Win32. Functionally class vars in Object Pascal (and most other languages, for that matter) work like class-scoped global variables, i.e. their lifetime is global and there is only one copy of the variable per declaration. Indeed, before having access to proper class variables, most Delphi programmers would use a global variable hidden in the implementation section of the unit that declares the class instead.

Poor man's class variables

For instance, let say you want to keep track of the number of instances that has been created of a specific class. In Delphi 7 and earlier you might have written:

type
TFruit = class
public
constructor Create;
class function InstanceCount: integer;
end;
 
implementation
 
var
FInstanceCount: integer;
 
constructor TFruit.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;
 
class function TFruit.InstanceCount: integer;
begin
Result := FInstanceCount;
end;

Here the FInstanceCount global variable is used as a poor-man's class variable. It is incremented in the constructor and we use a class function to return its value. [Yes, a more robust implementation would probably override NewInstance and FreeInstance to increment and decrement the counter, respectively - and we should probably make them thread-safe, but we're trying to keep things simple here - HV].


Language support for class variables


Fast-forward to Delphi 2007 and we can rewrite the code using a class var instead (class vars have been supported since Delphi 8 for .NET).

type
TFruit = class
private
class var FInstanceCount: integer;
public
constructor Create;
class property InstanceCount: integer read FInstanceCount;
end;
 
implementation
 
constructor TFruit.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;

Note that we have changed the InstanceCount class function into a class property instead. This reduces the amount of code and is more efficient - I covered class var and class property in the D4DNP Chapter 10 extract here.


This change will keep the OOP-purists at ease, but the underlying implementation (the code at the CPU level) stays the same. The class var FInstanceCount is assigned a static address in the global data segment by the linker. The implication of this is that the class var is shared among the TFruit class and all descendant classes.


Naïve assumptions


For instance, a naïve programmer wanting to keep track of the number of apples and oranges created in his application may write something like:

type
TApple = class(TFruit)
// ..
end;
TOrange = class(TFruit)
// ..
end;
 
procedure Test;
var
List: TList;
begin
List := TList.Create;
List.Add(TApple.Create);
List.Add(TApple.Create);
List.Add(TOrange.Create);
Writeln('Apples: ', TApple.InstanceCount);
Writeln('Oranges: ', TOrange.InstanceCount);
readln;
end;

The expected output is 2 apples and 1 orange, but the actual output is:

Apples: 3
Oranges: 3

The reason, of course, is that the class var is shared between the TFruit, TApple and TOrange classes.


Explicit per-class class variables implementation


The most straightforward solution to this problem is to explicitly declare class vars in each descendant class. Then we can use a virtual class function to return the instance count for each class. For instance:

type
TFruit = class
private
class var FInstanceCount: integer;
public
constructor Create;
class function InstanceCount: integer; virtual;
end;
TApple = class(TFruit)
private
class var FInstanceCount: integer;
public
constructor Create;
class function InstanceCount: integer; override;
end;
TOrange = class(TFruit)
private
class var FInstanceCount: integer;
public
constructor Create;
class function InstanceCount: integer; override;
end;
 
implementation
 
constructor TFruit.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;
 
class function TFruit.InstanceCount: integer;
begin
Result := FInstanceCount;
end;
 
constructor TApple.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;
 
class function TApple.InstanceCount: integer;
begin
Result := FInstanceCount;
end;
 
constructor TOrange.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;
 
class function TOrange.InstanceCount: integer;
begin
Result := FInstanceCount;
end;

That's a lot of repetitive code, but at least when we now run the same Test procedure above we get the expected result.

Apples: 2
Oranges: 1

If you want to support this kind of per-class meta information in a large hierarchy of classes (say in a custom business class library), it quickly becomes unwieldy to duplicate this code in every subclass. The InstanceCount property or function is a feature introduced and implemented by the initial base class - so why should all subclasses be required to help implement it?


virtual class variables


What we need is a new language feature - a new kind of class var that is not implemented like a simple global variable, but as a per-class or per-VMT basis. Lets call this imaginary feature virtual class vars - virtual because its value varies with the run-time class - just like a virtual class function implementation varies with the run-time class. An imaginary syntax for this imaginary feature could be:

    class var FInstanceCount: integer; virtual;

This would be the most natural syntax, IMO, but it would require promoting 'virtual' from a simple position sensitive directive to a full fledged reserved keyword. Making it a reserved keyword will break existing code that uses 'virtual' as an identifier, so a more realistic syntax would be one that only uses virtual as a directive, like this:

    class virtual var FInstanceCount: integer;

For the same reason we have the somewhat unintuitive syntax declarations like; class sealed and class abstract. With this imaginary syntax and language feature in place, the following code sample:

type
TFruit = class
private
class virtual var FInstanceCount: integer;
public
constructor Create;
class property InstanceCount: integer read FInstanceCount;
end;
TApple = class(TFruit)
//...
end;
TOrange = class(TFruit)
//...
end;

implementation

constructor TFruit.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;

class function TFruit.InstanceCount: integer;
begin
Result := FInstanceCount;
end;

procedure Test;
var
List: TList;
begin
List := TList.Create;
List.Add(TApple.Create);
List.Add(TApple.Create);
List.Add(TOrange.Create);
Writeln('Apples: ', TApple.InstanceCount);
Writeln('Oranges: ', TOrange.InstanceCount);
readln;
end;

would now output the expected:


Apples: 2
Oranges: 1

An old report


As it happens, I actually made a suggestion to Borland to implement class variables with these per-class semantics way back in 1998 - when Delphi 4 was the current version and plain class var and class property where still four Delphi versions away. Excerpts from my original report (that has been Closed with As Designed ages ago):



Please add proper class fields. This would also support class properties. Suggested syntax:

type
TFoo = class
private
class FBar: integer;
class procedure SetBar(Value: integer);
public
class property Bar: integer read FBar write SetBar;
end;

class procedure TFoo.SetBar(Value: integer);
begin
if Value <> FBar then
begin
FBar := Value;
end;
end;

This feature is very useful when working much with meta-classes. You can kind-of simulate this by using global variables in the implementation section, but it is not what I want. If you use the global-variable approach, all derived classes will share the same variable. This is not ideal.

Each new derived class should have it's own copy of the variable (just like ClassName and InstanceSize are unique for each class). Both idioms might be useful, though. Maybe there should be a separate syntax for the shared class field thing?

  TFoo = class
private
class FBar: integer; const;

Although the suggested syntax is different (and in hindsight, horrid), this is basically the same feature request we discussed above. As we know now, classic shared class vars have been implemented, while per-class virtual class fields have not. I can't say I blame them (Borland/CodeGear) though - demand hasn't been high for this feature, and I don't know of any other language that implements it (do you?).


Virtual class var implementation


How could such a language feature be implemented? Well, we know how virtual methods (both instance and class method) are implemented - the compiler assigns a unique slot in the VMT (virtual method table) for each introduced virtual method. There is one VMT for each class. Each virtual method has an associated unique index (that can be retrieved in BASM using the VMTINDEX directive) which can be used to calculate the VMT slot and lookup the code address of the virtual method.


VMT slot per field


What if we extended the VMT to contain one extra slot per declared  virtual class var? This would be a straightforward solution. The main benefit is that the VMT of classes without virtual class var (i.e. 100% of existing classes) would not change at all. The problem is that the VMT is stored in the code-segment - and keeping writable data variables there is Not a Good Idea (tm).


As we have seen in our recent self-modifying code hacks, to avoid access violations and DEP (Data Execution Protection) problems you have to be careful with mixing code and data. Particularly, to write data to the code segment you have to change the access rights of the code page(s) the data resides in. And to be a good citizen you should restore the rights back to the original when you're done, like this helper routine does:

procedure PatchCodeDWORD(Code: PDWORD; Value: DWORD);
// Self-modifying code - change one DWORD in the code segment
var
RestoreProtection, Ignore: DWORD;
begin
if VirtualProtect(Code, SizeOf(Code^), PAGE_EXECUTE_READWRITE,
RestoreProtection) then
begin
Code^ := Value;
VirtualProtect(Code, SizeOf(Code^), RestoreProtection, Ignore);
FlushInstructionCache(GetCurrentProcess, Code, SizeOf(Code^));
end;
end;

And doing this is not thread-safe, of course. If you're really unlucky another thread could come and change the rights again before you get the chance to perform the write operation. So this is not something you want to do every time you change a virtual class var. Strike solution one.


Virtual ClassFieldTable


Doing something to the VMT is a good idea, but storing the actual live data there is not. As usual, adding an extra level of indirection solves the problem. We should extend the VMT with a new magic slot - lets call it ClassFieldTable (its implicit that we're talking about virtual class fields here - otherwise it wouldn't belong in the VMT). This slot points to a record structure in the global data segment. The record contains fields that corresponds to all the virtual class vars that has been declared on the class or inherited from the parent class. Each derived class has a unique copy of this record in the data segment - and the ClassFieldTable slot in the VMT points to the unique copy.


Now we have solved the cannot-write-data-to-code-pages problem. The ClassFieldTable pointer is still part of the VMT and stored in a code page, but it's fixed-up by the linker/loader to point to the correct global record variable and never changes at run-time. An added benefit of using implicitly declared global record variables (i.e. generated by the compiler)for each class' virtual class vars is that we get the compiler magic to finalize managed fields in the record (AnsiString, WideString, interface, Variant and  dynamic array) for free.


Compiler's implementation


Now lets imagine what the compiler would have to do to implement virtual class vars by using some pseudo-code on a variant of the imaginary sample code above. Here is the modified example were all the three classes form a 3-generation inheritance chain and I've added another virtual class var to one of the descendent classes

type
TFruit = class
private
class virtual var FInstanceCount: integer;
public
constructor Create;
class property InstanceCount: integer read FInstanceCount;
end;
TCitrus = class(TFruit)
end;
TOrange = class(TCitrus)
private
class virtual var ClassDescription: string;
end;

And here is the pseudo-code that tries to illustrate what the compiler would do to implement this code sample.

type
TFruit = class
private
class virtual var FInstanceCount: integer;
public
constructor Create;
class property InstanceCount: integer read FInstanceCount;
end;
TCitrus = class(TFruit)
end;
TOrange = class(TCitrus)
private
class virtual var ClassDescription: string;
end;
// Compiler generated types and variables
var
// Global variables used for per-class virtual class fields
FruitClassVars = record
FInstanceCount: integer;
end;
CitrusClassVars = record // inherits field
FInstanceCount: integer;
end;
OrangeClassVars = record // inherits field, introduces new field
FInstanceCount: integer;
ClassDescription: string;
end;
// New VMT slot initialization, generated by compiler:
TFruitVMT = record


ClassVarTable := @FruitClassVars;
end;
TCitrusVMT = record
ClassVarTable := @CitrusClassVars;
end;
TOrangeVMT = record
ClassVarTable := @OrangeClassVars;
end;

It's interesting to see that this closely resembles the implementation suggestion I made some 9 years ago. From the same report I showed an excerpt of above:



How to implement this:



The shared-variable type of class field could be implemented by using a space from the global data segment. The underlying implementation would thus be the same as using a global variable - only the syntax would be more logical (than using an explicit global variable).

The each-class-one-variable type of class field could be implemented by adding two fields to the VMT:

ClassInstanceSize : Integer
ClassInstanceData: Pointer;

The ClassInstanceSize would give the number of bytes allocated for class fields in [the] class. The ClassInstanceData would point to the block of memory containing the class fields. This memory block should be in the global data segment, initialized to all zeros.

At compile-time these fields would be setup while creating the VMT tables. A class that inherits from another class and adds its own fields would have the ClassInstanceSize = Parent.ClassInstanceSize + SizeOf(class fields in this class).


I think now that the ClassInstanzeSize (or ClassVarTableSize) is unnecessary to keep in the VMT. The compiler needs this information in its internal bookkeeping, but it is not strictly needed at runtime. In a way this is the same case as for virtual methods. The compiler keeps track of the number of virtual methods in each class (as part of the compile-time class information stored in the .dcu), but the code it generates does not need it, and thus there is no VirtualMethodCount field in the VMT. The same logic applies to our new virtual class fields and the new ClassVarTable slot.


To be continued...


This post is getting a bit long, so I've decided to split it in two. In Part II we will look at Patrick's hack of implementing a workaround for the lack of proper language level virtual class vars.

3 comments:

Jim McKeeth said...

Another great post. Thanks Hallvard!

Anonymous said...

Virtual muddies the waters and suggests that it can be overridden, which would be VERY messy.

How about SPECIFIC, as in:

class SPECIFIC var FInstanceCount: integer;

It IS an interesting idea, but the keyword definitely needs work, and virtual implies something else here.

Hallvard Vassbotn said...

> Virtual muddies the waters and suggests that it can be overridden, which would be VERY messy.

Maybe. I'm not too concerned about what syntax this feature would use, but I think that virtual makes sense.

Virtual indicates that it varies by class. The value of a virtual class var varies by class. The implementation of a virtual method varies by class.



Copyright © 2004-2007 by Hallvard Vassbotn