Monday, May 07, 2007

DN4DP#8: Unicode identifiers

This post continues the series of The Delphi Language Chapter teasers from Jon Shemitz’ .NET 2.0 for Delphi Programmers book. Last time we included the section on the new inlining support of the compiler. This post briefly covers the internationalization of Pascal identifiers (Klingon code, anyone?).

Note that I do not get any royalties from the book and I highly recommend that you get your own copy – for instance at Amazon.

"Unicode identifiers

Traditional Pascal and Delphi has restricted identifier names to be lower and upper case ASCII letters (a-z), underscore and digits. In .NET all strings are Unicode and identifiers can use Unicode characters, including national characters, such as the Norwegian Æ, Ø and Å.

Delphi now supports Unicode characters in identifiers, as long as the source file is encoded in a Unicode format (UTF-8 or UTC2). Identifiers that end up in RTTI data (unit names, class names and published members) must still be pure ASCII - the main reason is to avoid breaking code that read RTTI strings.

type
TUnicodeClass = class
private
FAntallÅr: integer;
public
procedure SetAntallÅr(const Value: integer);
property AntallÅr: integer read FAntallÅr write SetAntallÅr;
end;

procedure TestÆØÅ;
var
Unicode: TUnicodeClass;
begin
Unicode := UnicodeClass.Create;
Unicode.AntallÅr := 42;
end;




Tip To change the encoding of a source file to UTF-8, right-click in the Delphi editor and select File Format | UTF-8, then save it.




"

Friday, May 04, 2007

Hack#17: Virtual class variables, Part I

[Note: This blog post was inspired by an email conversation I had with Patrick van Logchem - more details on this in Part II]

Proper Object Pascal support for class var variables was first introduced in Delphi 8 for .NET and later in Delphi 2005 for Win32. Functionally class vars in Object Pascal (and most other languages, for that matter) work like class-scoped global variables, i.e. their lifetime is global and there is only one copy of the variable per declaration. Indeed, before having access to proper class variables, most Delphi programmers would use a global variable hidden in the implementation section of the unit that declares the class instead.

Poor man's class variables

For instance, let say you want to keep track of the number of instances that has been created of a specific class. In Delphi 7 and earlier you might have written:

type
TFruit = class
public
constructor Create;
class function InstanceCount: integer;
end;
 
implementation
 
var
FInstanceCount: integer;
 
constructor TFruit.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;
 
class function TFruit.InstanceCount: integer;
begin
Result := FInstanceCount;
end;

Here the FInstanceCount global variable is used as a poor-man's class variable. It is incremented in the constructor and we use a class function to return its value. [Yes, a more robust implementation would probably override NewInstance and FreeInstance to increment and decrement the counter, respectively - and we should probably make them thread-safe, but we're trying to keep things simple here - HV].


Language support for class variables


Fast-forward to Delphi 2007 and we can rewrite the code using a class var instead (class vars have been supported since Delphi 8 for .NET).

type
TFruit = class
private
class var FInstanceCount: integer;
public
constructor Create;
class property InstanceCount: integer read FInstanceCount;
end;
 
implementation
 
constructor TFruit.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;

Note that we have changed the InstanceCount class function into a class property instead. This reduces the amount of code and is more efficient - I covered class var and class property in the D4DNP Chapter 10 extract here.


This change will keep the OOP-purists at ease, but the underlying implementation (the code at the CPU level) stays the same. The class var FInstanceCount is assigned a static address in the global data segment by the linker. The implication of this is that the class var is shared among the TFruit class and all descendant classes.


Naïve assumptions


For instance, a naïve programmer wanting to keep track of the number of apples and oranges created in his application may write something like:

type
TApple = class(TFruit)
// ..
end;
TOrange = class(TFruit)
// ..
end;
 
procedure Test;
var
List: TList;
begin
List := TList.Create;
List.Add(TApple.Create);
List.Add(TApple.Create);
List.Add(TOrange.Create);
Writeln('Apples: ', TApple.InstanceCount);
Writeln('Oranges: ', TOrange.InstanceCount);
readln;
end;

The expected output is 2 apples and 1 orange, but the actual output is:

Apples: 3
Oranges: 3

The reason, of course, is that the class var is shared between the TFruit, TApple and TOrange classes.


Explicit per-class class variables implementation


The most straightforward solution to this problem is to explicitly declare class vars in each descendant class. Then we can use a virtual class function to return the instance count for each class. For instance:

type
TFruit = class
private
class var FInstanceCount: integer;
public
constructor Create;
class function InstanceCount: integer; virtual;
end;
TApple = class(TFruit)
private
class var FInstanceCount: integer;
public
constructor Create;
class function InstanceCount: integer; override;
end;
TOrange = class(TFruit)
private
class var FInstanceCount: integer;
public
constructor Create;
class function InstanceCount: integer; override;
end;
 
implementation
 
constructor TFruit.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;
 
class function TFruit.InstanceCount: integer;
begin
Result := FInstanceCount;
end;
 
constructor TApple.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;
 
class function TApple.InstanceCount: integer;
begin
Result := FInstanceCount;
end;
 
constructor TOrange.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;
 
class function TOrange.InstanceCount: integer;
begin
Result := FInstanceCount;
end;

That's a lot of repetitive code, but at least when we now run the same Test procedure above we get the expected result.

Apples: 2
Oranges: 1

If you want to support this kind of per-class meta information in a large hierarchy of classes (say in a custom business class library), it quickly becomes unwieldy to duplicate this code in every subclass. The InstanceCount property or function is a feature introduced and implemented by the initial base class - so why should all subclasses be required to help implement it?


virtual class variables


What we need is a new language feature - a new kind of class var that is not implemented like a simple global variable, but as a per-class or per-VMT basis. Lets call this imaginary feature virtual class vars - virtual because its value varies with the run-time class - just like a virtual class function implementation varies with the run-time class. An imaginary syntax for this imaginary feature could be:

    class var FInstanceCount: integer; virtual;

This would be the most natural syntax, IMO, but it would require promoting 'virtual' from a simple position sensitive directive to a full fledged reserved keyword. Making it a reserved keyword will break existing code that uses 'virtual' as an identifier, so a more realistic syntax would be one that only uses virtual as a directive, like this:

    class virtual var FInstanceCount: integer;

For the same reason we have the somewhat unintuitive syntax declarations like; class sealed and class abstract. With this imaginary syntax and language feature in place, the following code sample:

type
TFruit = class
private
class virtual var FInstanceCount: integer;
public
constructor Create;
class property InstanceCount: integer read FInstanceCount;
end;
TApple = class(TFruit)
//...
end;
TOrange = class(TFruit)
//...
end;

implementation

constructor TFruit.Create;
begin
inherited Create;
Inc(FInstanceCount);
end;

class function TFruit.InstanceCount: integer;
begin
Result := FInstanceCount;
end;

procedure Test;
var
List: TList;
begin
List := TList.Create;
List.Add(TApple.Create);
List.Add(TApple.Create);
List.Add(TOrange.Create);
Writeln('Apples: ', TApple.InstanceCount);
Writeln('Oranges: ', TOrange.InstanceCount);
readln;
end;

would now output the expected:


Apples: 2
Oranges: 1

An old report


As it happens, I actually made a suggestion to Borland to implement class variables with these per-class semantics way back in 1998 - when Delphi 4 was the current version and plain class var and class property where still four Delphi versions away. Excerpts from my original report (that has been Closed with As Designed ages ago):



Please add proper class fields. This would also support class properties. Suggested syntax:

type
TFoo = class
private
class FBar: integer;
class procedure SetBar(Value: integer);
public
class property Bar: integer read FBar write SetBar;
end;

class procedure TFoo.SetBar(Value: integer);
begin
if Value <> FBar then
begin
FBar := Value;
end;
end;

This feature is very useful when working much with meta-classes. You can kind-of simulate this by using global variables in the implementation section, but it is not what I want. If you use the global-variable approach, all derived classes will share the same variable. This is not ideal.

Each new derived class should have it's own copy of the variable (just like ClassName and InstanceSize are unique for each class). Both idioms might be useful, though. Maybe there should be a separate syntax for the shared class field thing?

  TFoo = class
private
class FBar: integer; const;

Although the suggested syntax is different (and in hindsight, horrid), this is basically the same feature request we discussed above. As we know now, classic shared class vars have been implemented, while per-class virtual class fields have not. I can't say I blame them (Borland/CodeGear) though - demand hasn't been high for this feature, and I don't know of any other language that implements it (do you?).


Virtual class var implementation


How could such a language feature be implemented? Well, we know how virtual methods (both instance and class method) are implemented - the compiler assigns a unique slot in the VMT (virtual method table) for each introduced virtual method. There is one VMT for each class. Each virtual method has an associated unique index (that can be retrieved in BASM using the VMTINDEX directive) which can be used to calculate the VMT slot and lookup the code address of the virtual method.


VMT slot per field


What if we extended the VMT to contain one extra slot per declared  virtual class var? This would be a straightforward solution. The main benefit is that the VMT of classes without virtual class var (i.e. 100% of existing classes) would not change at all. The problem is that the VMT is stored in the code-segment - and keeping writable data variables there is Not a Good Idea (tm).


As we have seen in our recent self-modifying code hacks, to avoid access violations and DEP (Data Execution Protection) problems you have to be careful with mixing code and data. Particularly, to write data to the code segment you have to change the access rights of the code page(s) the data resides in. And to be a good citizen you should restore the rights back to the original when you're done, like this helper routine does:

procedure PatchCodeDWORD(Code: PDWORD; Value: DWORD);
// Self-modifying code - change one DWORD in the code segment
var
RestoreProtection, Ignore: DWORD;
begin
if VirtualProtect(Code, SizeOf(Code^), PAGE_EXECUTE_READWRITE,
RestoreProtection) then
begin
Code^ := Value;
VirtualProtect(Code, SizeOf(Code^), RestoreProtection, Ignore);
FlushInstructionCache(GetCurrentProcess, Code, SizeOf(Code^));
end;
end;

And doing this is not thread-safe, of course. If you're really unlucky another thread could come and change the rights again before you get the chance to perform the write operation. So this is not something you want to do every time you change a virtual class var. Strike solution one.


Virtual ClassFieldTable


Doing something to the VMT is a good idea, but storing the actual live data there is not. As usual, adding an extra level of indirection solves the problem. We should extend the VMT with a new magic slot - lets call it ClassFieldTable (its implicit that we're talking about virtual class fields here - otherwise it wouldn't belong in the VMT). This slot points to a record structure in the global data segment. The record contains fields that corresponds to all the virtual class vars that has been declared on the class or inherited from the parent class. Each derived class has a unique copy of this record in the data segment - and the ClassFieldTable slot in the VMT points to the unique copy.


Now we have solved the cannot-write-data-to-code-pages problem. The ClassFieldTable pointer is still part of the VMT and stored in a code page, but it's fixed-up by the linker/loader to point to the correct global record variable and never changes at run-time. An added benefit of using implicitly declared global record variables (i.e. generated by the compiler)for each class' virtual class vars is that we get the compiler magic to finalize managed fields in the record (AnsiString, WideString, interface, Variant and  dynamic array) for free.


Compiler's implementation


Now lets imagine what the compiler would have to do to implement virtual class vars by using some pseudo-code on a variant of the imaginary sample code above. Here is the modified example were all the three classes form a 3-generation inheritance chain and I've added another virtual class var to one of the descendent classes

type
TFruit = class
private
class virtual var FInstanceCount: integer;
public
constructor Create;
class property InstanceCount: integer read FInstanceCount;
end;
TCitrus = class(TFruit)
end;
TOrange = class(TCitrus)
private
class virtual var ClassDescription: string;
end;

And here is the pseudo-code that tries to illustrate what the compiler would do to implement this code sample.

type
TFruit = class
private
class virtual var FInstanceCount: integer;
public
constructor Create;
class property InstanceCount: integer read FInstanceCount;
end;
TCitrus = class(TFruit)
end;
TOrange = class(TCitrus)
private
class virtual var ClassDescription: string;
end;
// Compiler generated types and variables
var
// Global variables used for per-class virtual class fields
FruitClassVars = record
FInstanceCount: integer;
end;
CitrusClassVars = record // inherits field
FInstanceCount: integer;
end;
OrangeClassVars = record // inherits field, introduces new field
FInstanceCount: integer;
ClassDescription: string;
end;
// New VMT slot initialization, generated by compiler:
TFruitVMT = record


ClassVarTable := @FruitClassVars;
end;
TCitrusVMT = record
ClassVarTable := @CitrusClassVars;
end;
TOrangeVMT = record
ClassVarTable := @OrangeClassVars;
end;

It's interesting to see that this closely resembles the implementation suggestion I made some 9 years ago. From the same report I showed an excerpt of above:



How to implement this:



The shared-variable type of class field could be implemented by using a space from the global data segment. The underlying implementation would thus be the same as using a global variable - only the syntax would be more logical (than using an explicit global variable).

The each-class-one-variable type of class field could be implemented by adding two fields to the VMT:

ClassInstanceSize : Integer
ClassInstanceData: Pointer;

The ClassInstanceSize would give the number of bytes allocated for class fields in [the] class. The ClassInstanceData would point to the block of memory containing the class fields. This memory block should be in the global data segment, initialized to all zeros.

At compile-time these fields would be setup while creating the VMT tables. A class that inherits from another class and adds its own fields would have the ClassInstanceSize = Parent.ClassInstanceSize + SizeOf(class fields in this class).


I think now that the ClassInstanzeSize (or ClassVarTableSize) is unnecessary to keep in the VMT. The compiler needs this information in its internal bookkeeping, but it is not strictly needed at runtime. In a way this is the same case as for virtual methods. The compiler keeps track of the number of virtual methods in each class (as part of the compile-time class information stored in the .dcu), but the code it generates does not need it, and thus there is no VirtualMethodCount field in the VMT. The same logic applies to our new virtual class fields and the new ClassVarTable slot.


To be continued...


This post is getting a bit long, so I've decided to split it in two. In Part II we will look at Patrick's hack of implementing a workaround for the lack of proper language level virtual class vars.

Friday, April 27, 2007

Psst, a special price for you, my friend...

An item that may interest you is available at eBay now, click here.

PS: I still have another Delphi 2007 licence.

Tuesday, April 17, 2007

DN4DP#7: Inlined routines

This post continues the series of The Delphi Language Chapter teasers from Jon Shemitz’ .NET 2.0 for Delphi Programmers book. Last time we covered the new for in loop and the pattern for introducing enumeration support to your own classes. This post includes the section on the inlining support of the compiler.

Note that I do not get any royalties from the book and I highly recommend that you get your own copy – for instance at Amazon.

"Inlined routines

From Chapter 4 we know that the .NET Just-In-Time (JIT) compiler will automatically perform optimizations, including inlining small and simple methods at call sites. In addition to this JIT inlining, Delphi now supports explicit inlining of non-virtual routines, both in .NET and Win32.

function InlineMeToo(const Value: integer): integer; inline;
begin
Result := Value * 200 div 300;
end;

The inline directive is just a hint to the Delphi compiler that it should try to expand the code inline whenever the routine is called, at compile time. The exact rules of what and when it can be inlined differ slightly between the two platforms.

In .NET, Delphi inlining occurs at the IL level. This means that the IL code generated at the call site must obey CLR limitations and rules regarding member visibility. This limitation is mitigated by the less constrained JIT inlining that occurs at runtime.

The Win32 inlining support was made more aggressive in Delphi 2006 - now even methods that access private members can be inlined. Assembly (BASM) code cannot be inlined.

The rest of the inlining restrictions are common for both platforms and the most important ones are


  • no inlining across package boundaries
  • the inlined routine cannot access implementation section identifiers
  • the call site must have access to all identifiers used in the inlined routine


Note The last point means that unless the call site unit uses the units required by the routine, the routine cannot be inlined. When this happens, the compiler emits a hint like this

[Pascal Hint] InlinedRoutinesU.pas(14): H2443 Inline function 'InlineMe' has not been expanded because unit 'RequiredUnit' is not specified in USES list

To resolve the issue, add the missing unit name to the call site's uses clause.


The {$INLINE ON/AUTO/OFF} compiler directive can be used both at the definition and the call site. The OFF mode turns off all inlining. The default ON setting tries to inline routines explicitly marked inline. AUTO additionally tries to inline all small routines (consisting of less than 32 bytes of machine or IL code).




Caution Be careful with inlining too much code, the potential code size increase may actually reduce performance.




"


Update [May 12th 2007]:


In addition to the inline restrictions mentioned above I should have included the issue of implementation order.


If the inlined method is called from the same unit, the implementation of the inline method must have been "seen" by the compiler. In other words, the inlined implementation should preceed the call.

For example:

type
Foo = class
procedure A; inline;
procedure B;
end;

procedure TFoo.B;
begin
A; // note: Call to A is *not* inlined here
end;

procedure TFoo.A;
begin
// code
end;

Change this to:

type
Foo = class
procedure A; inline;
procedure B;
end;

procedure TFoo.A;
begin
// code
end;

procedure TFoo.B;
begin
A; // note: Call to A *is* inlined here
end;

Wednesday, April 11, 2007

Subversion in Delphi's Tools menu

Joe White writes about how to add Tools menu items with Subversion commands
here.

For some reason, the Submit button didn't work on his Comments page (running IE6), so I'll just write my comment here:

Nice, thanks!

I've been meaning to do the same thing for a while, but kept postponing it. Shame about the space-before-macro requirement in the Tools menu - you should probably log it in QC.

I didn't have Ruby installed so I wrote a simple online .Bat file instead:

"c:/program files/tortoisesvn/bin/tortoiseproc.exe" /command:%1 /path:%2 /notempfile

Then I create the Tools items with:
Program: c:\windows\system32\cmd.exe
Parameters: /C C:\SvnPas\Utils\Batch\SvnCmd.Bat diff $EDNAME $SAVEALL

Works fine!

Sunday, April 08, 2007

Hack#16: Published field RTTI replacement trick

We're back to fixing the interesting (learning-wise) problem of the flickering TProgressBar on Windows Vista. We have already looked at two relatively dirty and intrusive hacks that either overwrites the TClass reference inside each progress bar instance or overwrites the dynamic method table pointer of the original TProgressBar VMT. There are other related code page overwrite hacks that we may look at in the future (overwriting a virtual method slot, for instance), but this time we will look at a much simpler and (IMO) more elegant solution; tricking the compiler to replace the RTTI it generates for the component field on the for with a fixed version of TProgressBar.

As you know (if you have read the published fields and details articles), the IDE inserts component fields into the unnamed published section of your form class declaration as you drop components and controls on to the form at design time. Since the fields are published, the compiler generates RTTI for them - including the name, type and offset of the field.

For instance, creating a new form and dropping a couple of labels, a button and a progress bar on it, the IDE has generated the following code.

unit Unit1;

interface

uses
Windows, Messages, SysUtils, Variants, Classes,
Graphics, Controls, Forms,
Dialogs, ComCtrls, StdCtrls;

type
TForm1 = class(TForm)
Label1: TLabel;
Label2: TLabel;
Button1: TButton;
ProgressBar1: TProgressBar;
private
{ Private declarations }
public
{ Public declarations }
end;

var
Form1: TForm1;

implementation

{$R *.dfm}

end.

The published field RTTI includes the type of each field - encoded as a TClass reference. But how does the compiler know what TClass to use? By using the normal Pascal scoping rules, of course. TLabel and TButton both refers to the classes defined in the StdCtrls unit. For TProgressBar the compiler first checks the StdCtrls unit, but failing to find any TProgressBar class there, it looks in ComCtrls unit and finds it there.


Here lies the clue to our trick - we can simply add another unit to the interface uses clause, one that contains a fixed version of TProgressBar. We just have to make sure this unit is listed after the unit that contains the original TProgressBar (ComCtrls in this case). Let's get to it.


First we write (yet another version of) the fixed TProgressClass and put it inside an aptly named unit.

unit HVProgressBarVistaFix;

interface

uses Messages, ComCtrls;

type
TProgressBar = class(ComCtrls.TProgressBar)
private
procedure WMEraseBkgnd(var Message: TWmEraseBkgnd);
message WM_ERASEBKGND;
end;

implementation

procedure TProgressBar.WMEraseBkgnd(var Message: TWmEraseBkgnd);
begin
DefaultHandler(Message);
end;

end.

Notice two things; the class must have the same name as the original class 'TProgressBar' and to avoid a compiler error message like



[Error] HVProgressBarVistaFix.pas(8): Type 'TProgressBar' is not yet completely defined


The class we inherit from has to be explicitly qualified with the unit it comes from 'ComCtrls.TProgressBar'.


With this simple unit in hand, we can fix the progress bar flickering simply by referencing the HVProgressBarVistaFix unit from all forms that uses progress bars (search your code for TProgressBar). Make sure it is listed last in the uses clause of the interface section.

uses
Windows, Messages, SysUtils, Variants, Classes,
Graphics, Controls, Forms,
Dialogs, ComCtrls, StdCtrls, HVProgressBarVistaFix;

This ensures that the compiler inserts the TClass reference of our fixed version of the TProgressBar class into the published field RTTI tables of the form. When the streaming system reads the .dfm at runtime it uses these published field tables (read the details here) to find the TClass reference (or TComponentClass reference, to be exact) and uses it (and the virtual constructor of TComponent) to create the component. Because we have replaced the ComCtrls.TProgressBar class reference with our fixed version in HVProgressBarVistaFix.TProgressBar, it is our class that will be created.


With this simple kind of hack you can introduce any number of fixes, overriding virtual, dynamic or message methods, for instance. Look ma - no dirty hands! ;).

Tuesday, April 03, 2007

DN4DP#6: Enumerating collections

This post continues the series of The Delphi Language Chapter teasers from Jon Shemitz’ .NET 2.0 for Delphi Programmers book. Last time we showed how it is now possible to override the meaning of language operators. This time we'll cover the new for in loop and the pattern for introducing enumeration support to your own classes.

Note that I do not get any royalties from the book and I highly recommend that you get your own copy – for instance at Amazon.

"Enumerating collections

To make it easier and more convenient to enumerate over the contents of collections the traditional for statement has been extended into a for in statement. In general the for in syntax is

var
Element: ElementType;
begin
for Element in Collection do
Writeln(Element.Member);
end;

ElementType must be assignment compatible with the type of the actual elements stored inside the collection. The collection must implement the enumerator pattern or be of an array, string or set type. You only have read access to the Element itself, but you can change any properties and fields it references.

All .NET collections and most VCL container classes like TList and TStrings implements the required pattern, so now you can transform old code like

var
S: string;
i: integer;
begin
for i := 0 to MyStrings.Count-1 do
begin
S := MyStrings[i];
writeln(S);
end;
end;

into the simpler, less error prone, but equivalent

var
S: string;
begin
for S in MyStrings do
writeln(S);
end;

To enable for in for your own collection classes, you need to implement the enumerator pattern. This involves writing a GetEnumerator function that returns an instance (class, record or interface) that implements a Boolean MoveNext function and a Current property. In .NET you can also achieve this by implementing the IEnumerable interface. In Win32 these methods must be public.

type
TMyObjectsEnumerator = class
public
function GetCurrent: integer;
function MoveNext: Boolean;
property Current: integer read GetCurrent;
end;
TMyObjects = class
public
function GetEnumerator: TMyObjectsEnumerator;
end;

The EnumeratingCollections project demonstrates the differences between the old manual enumeration loops and the new for in loops. It also includes an example of how to write your own classes that support for in enumeration."



Copyright © 2004-2007 by Hallvard Vassbotn