Tuesday, May 02, 2006

Under the hood of published methods

Now that we have covered what published methods are, how the IDE and VCL uses them in .DFM streaming and how to use them polymorphically, we are ready to dive deeper to see how they are implemented under the hood.

If you have been following this series of articles about the polymorphic features of the Delphi language, you will have noticed that the VMT contains a MethodTable field, currently defined as an untyped pointer. By scrutinizing the TObject methods that access this table, MethodName and MethodAddress, I’ve been able to write approximate Pascal declarations to describe the structure of the MethodTable.

type
PPublishedMethod = ^TPublishedMethod;
TPublishedMethod = packed record
Size: word;
Address: Pointer;
Name: {packed} Shortstring; // really string[Length(Name)]
end;
TPublishedMethods = packed array[0..High(Word)-1] of TPublishedMethod;
PPmt = ^TPmt;
TPmt = packed record
Count: Word;
Methods: TPublishedMethods; // really [0..Count-1]
end;

PVmt = ^TVmt;
TVmt = packed record
// …
MethodTable : PPmt;
// …
end;

As you can see above the published method table now has the type PPmt. It points to a record that contains the number of published methods in this class followed by a packed array of TPublishedMethod records. Each record contains a size (used to find the start of the next record), a pointer to the address of the method and a packed shortstring containing the name of the method.

Notice that it appears that the Size field would have been unnecessary. In all my testing the value of Size has always been equal to the expression:

  Size :=  SizeOf(Size) + SizeOf(Address) + SizeOf(Name[0]) + Length(Name);

In other words, the next TPublishedMethod record starts just after the last byte of the method name. I’m not sure why Borland decided to add the Size field, but one possible reason might be to be able to extend the contents of the TPublishedMethod record in the future. One natural extension would be to include information about the parameters and calling convention of the method. Then Size would be adjusted accordingly and old code unaware of the new fields would still work fine (see the sidebar Extra published method data below).

Now that we have a few data structures to work with we can start writing some utility routines.

function GetVmt(AClass: TClass): PVmt;
begin
Result := PVmt(AClass);
Dec(Result);
end;

function GetPmt(AClass: TClass): PPmt;
var
Vmt: PVmt;
begin
Vmt := GetVmt(AClass);
if Assigned(Vmt)
then Result := Vmt.MethodTable
else Result := nil;
end;

function GetPublishedMethodCount(AClass: TClass): integer;
var
Pmt: PPmt;
begin
Pmt := GetPmt(AClass);
if Assigned(Pmt)
then Result := Pmt.Count
else Result := 0;
end;

function GetPublishedMethod(AClass: TClass; Index: integer): PPublishedMethod;
var
Pmt: PPmt;
begin
Pmt := GetPmt(AClass);
if Assigned(Pmt) and (Index < Pmt.Count) then
begin
Result := @Pmt.Methods[0];
while Index > 0 do
begin
Inc(PChar(Result), Result.Size);
Dec(Index);
end;
end
else
Result := nil;
end;

First we have our old friend GetVmt to get a pointer to the magic part of the VMT given a class reference. Using this and the new PPmt type we can write the GetPmt function above – this returns a pointer to the class’ published method table. Then there are two methods that return the number of published methods and a specific published method, given an index from 0 to Count-1. Using these utility routine we can now write some test code to dump all the published methods of a class (and its parent classes).

procedure DumpPublishedMethods(AClass: TClass);
var
i : integer;
Method: PPublishedMethod;
begin
while Assigned(AClass) do
begin
writeln('Published methods in ', AClass.ClassName);
for i := 0 to GetPublishedMethodCount(AClass)-1 do
begin
Method := GetPublishedMethod(AClass, i);
writeln(Format('%d. MethodAddr = %p, Name = %s',
[i, Method.Address, Method.Name]));
end;
AClass := AClass.ClassParent;
end;
end;
This dumping works fine, but it has less than ideal performance complexity. The GetPublished method has iterate to the Index’th method for each call, giving the Dump routine quadric or O(n^2) performance complexity (where n is the number of published methods in the class). Now, most classes does not have that many published methods and the work done inside the inner loop is minimal, so in practice you should never experience this as a problem.

However, my performance obsession makes me want to speed this up, at least theoretically. The packed array of TPublishedMethod records can be seen as a primitive singly linked list – random access is slow – so an iterator-based technique should improve performance. Let’s write two more utility routines.
function GetFirstPublishedMethod(AClass: TClass): PPublishedMethod;
begin
Result := GetPublishedMethod(AClass, 0);
end;

function GetNextPublishedMethod(AClass: TClass; PublishedMethod:
PPublishedMethod): PPublishedMethod;
begin
Result := PublishedMethod;
if Assigned(Result) then
Inc(PChar(Result), Result.Size);
end;

These two routines constitute a typical GetFirst/GetNext pair of iterators. The first method returns a reference to the first published method while the second method returns a reference to the next published method. Notice that it is the responsibility of the caller to call GetNextPublishedMethod the correct number of times (by using GetPublishedMethodCount). Now we can rewrite the dumping method, making it slightly faster.

procedure DumpPublishedMethodsFaster(AClass: TClass);
var
i : integer;
Method: PPublishedMethod;
begin
while Assigned(AClass) do
begin
writeln('Published methods in ', AClass.ClassName);
Method := GetFirstPublishedMethod(AClass);
for i := 0 to GetPublishedMethodCount(AClass)-1 do
begin
writeln(Format('%d. MethodAddr = %p, Name = %s',
[i, Method.Address, Method.Name]));
Method := GetNextPublishedMethod(AClass, Method);
end;
AClass := AClass.ClassParent;
end;
end;

Iterating over or dumping all published methods in a class is not normally very useful. TObject already contains methods to perform published method lookups using MethodAddress and MethodName. These are written in efficient assembly, but that also makes them hard to read. I used them to determine the format of the published method table data structures above. Here are some corresponding routines in Pascal.

function FindPublishedMethodByName(AClass: TClass; 
const AName: ShortString): PPublishedMethod;
var
i : integer;
begin
while Assigned(AClass) do
begin
Result := GetFirstPublishedMethod(AClass);
for i := 0 to GetPublishedMethodCount(AClass)-1 do
begin
// Note: Length(ShortString) expands to efficient inline code
if (Length(Result.Name) = Length(AName)) and
(StrLIComp(@Result.Name[1], @AName[1], Length(AName)) = 0) then
Exit;
Result := GetNextPublishedMethod(AClass, Result);
end;
AClass := AClass.ClassParent;
end;
Result := nil;
end;

function FindPublishedMethodByAddr(AClass: TClass;
AAddr: Pointer): PPublishedMethod;
var
i : integer;
begin
while Assigned(AClass) do
begin
Result := GetFirstPublishedMethod(AClass);
for i := 0 to GetPublishedMethodCount(AClass)-1 do
begin
if Result.Address = AAddr then
Exit;
Result := GetNextPublishedMethod(AClass, Result);
end;
AClass := AClass.ClassParent;
end;
Result := nil;
end;

function FindPublishedMethodAddr(AClass: TClass; const
AName: ShortString): Pointer;
var
Method: PPublishedMethod;
begin
Method := FindPublishedMethodByName(AClass, AName);
if Assigned(Method)
then Result := Method.Address
else Result := nil;
end;

function FindPublishedMethodName(AClass: TClass;
AAddr: Pointer): Shortstring;
var
Method: PPublishedMethod;
begin
Method := FindPublishedMethodByAddr(AClass, AAddr);
if Assigned(Method)
then Result := Method.Name
else Result := '';
end;

The two first functions find a published method by name or address and return a pointer to the TPublishedMethod record that describes the method. Having direct access to this record deep inside the RTTI data structures can come in handy later. The two last functions return the string or address directly and correspond directly to MethodAddress and MethodName.

Finally we can write a class and some code to test the routines we have written.

type
{$M+}
TMyClass = class
published
procedure FirstPublished;
procedure SecondPublished(A: integer);
procedure ThirdPublished(A: integer); stdcall;
function FourthPublished(A: TComponent): TComponent; stdcall;
procedure FifthPublished(Component: TComponent); stdcall;
function SixthPublished(A: string; Two, Three, Four,
Five, Six: integer): string; pascal;
end;

procedure TMyClass.FirstPublished;
begin
end;
procedure TMyClass.SecondPublished;
begin
end;
procedure TMyClass.ThirdPublished;
begin
end;
function TMyClass.FourthPublished;
begin
Result := nil;
end;
procedure TMyClass.FifthPublished;
begin
end;
function TMyClass.SixthPublished;
begin
end;

procedure DumpMethod(Method: PPublishedMethod);
begin
if Assigned(Method)
then Writeln(Format('%p=%s', [Method.Address, Method.Name]))
else Writeln('nil');
end;

procedure Test;
begin
DumpPublishedMethods(TMyClass);
DumpPublishedMethodsFaster(TMyClass);
DumpMethod(FindPublishedMethodByName(TMyClass, 'FirstPublished'));
DumpMethod(FindPublishedMethodByName(TMyClass,
FindPublishedMethodName(TMyClass, @TMyClass.SecondPublished)));
DumpMethod(FindPublishedMethodByAddr(TMyClass, @TMyClass.ThirdPublished));
DumpMethod(FindPublishedMethodByAddr(TMyClass,
FindPublishedMethodAddr(TMyClass, 'FourthPublished')));
DumpMethod(FindPublishedMethodByAddr(TMyClass,
FindPublishedMethodByName(TMyClass, 'FifthPublished').Address));
DumpMethod(FindPublishedMethodByAddr(TMyClass, @TMyClass.SixthPublished));
DumpMethod(FindPublishedMethodByName(TMyClass, 'NotThere'));
DumpMethod(FindPublishedMethodByAddr(TMyClass, nil));
end;

begin
Test;
readln;
end.

The output from this little test snippet is:

Published methods in TMyClass
0. MethodAddr = 00412BCC, Name = FirstPublished
1. MethodAddr = 00412BD0, Name = SecondPublished
2. MethodAddr = 00412BD4, Name = ThirdPublished
3. MethodAddr = 00412BDC, Name = FourthPublished
4. MethodAddr = 00412BE8, Name = FifthPublished
5. MethodAddr = 00412BF0, Name = SixthPublished
Published methods in TObject
Published methods in TMyClass
0. MethodAddr = 00412BCC, Name = FirstPublished
1. MethodAddr = 00412BD0, Name = SecondPublished
2. MethodAddr = 00412BD4, Name = ThirdPublished
3. MethodAddr = 00412BDC, Name = FourthPublished
4. MethodAddr = 00412BE8, Name = FifthPublished
5. MethodAddr = 00412BF0, Name = SixthPublished
Published methods in TObject
00412BCC=FirstPublished
00412BD0=SecondPublished
00412BD4=ThirdPublished
00412BDC=FourthPublished
00412BE8=FifthPublished
00412BF0=SixthPublished
nil
nil

Detecting extra published method data
I’ve added some DEBUG code to GetNextPublishedMethod that tries to detect and raise an exception of it encounters a TPublishedMethod record where the Size field indicates that the record contains additional data after the packed Name string.

function GetNextPublishedMethod(AClass: TClass; 
PublishedMethod: PPublishedMethod): PPublishedMethod;
{$IFDEF DEBUG}
var
ExpectedSize: integer;
{$ENDIF}
begin
Result := PublishedMethod;
{$IFDEF DEBUG}
ExpectedSize := SizeOf(Result.Size)
+ SizeOf(Result.Address)
+ SizeOf(Result.Name[0])
+ Length(Result.Name);
if Result.Size <> ExpectedSize then
raise Exception.CreateFmt(
'RTTI for the published method "%s" of class "%s"
has %d extra bytes of unknown data!'
,
[Result.Name, AClass.ClassName, Result.Size-ExpectedSize]);
{$ENDIF}
if Assigned(Result) then
Inc(PChar(Result), Result.Size);
end;

During my testing with published methods of different calling conventions and number of parameters, this exception has never occurred. Let me know if you find otherwise.

I recalled faintly that Ray Lischner wrote about these extra fields in his excellent Delphi in a Nutshell. I was actually one of the technical editors of that book – so I should remember :-). As Ray documented (see page 74), Delphi 5 (and earlier versions) would encode the parameters of some published methods – more specifically stdcall methods with RTTI enabled parameter and return types. This half-hearted parameter encoding was probably the remains of some experimental RTTI generation code in the compiler that seems to have been removed from Delphi 7 and 2006.

6 comments:

Atle Smelvær said...

Nice article. The one thing that I still don't like about DevCo's RTTI is the fact that you cannot hide RTTI methods inside private and protected sections. The effects of this is the horrible public components and event handlers you get for all design forms etc. With adjustments to have "private published" and "protected published" they could eliminate this problem, and give way for new persistence possibilities.

Could you blog about this?

Look at QC 26801 and QC 26833 for more information.

Hallvard Vassbotn said...

Hi Atle,

""private published" and "protected published" they could eliminate this problem, and give way for new persistence possibilities."

I can see the usefulness of being able to remove the publicness of the TForm published fields and methods, but what "new persistence possibilities" are you thinking about?

"Look at QC 26801 and QC 26833 for more information."

I have at least opened these for you know.

Anonymous said...

Hi Hallvard,

your XML feed is not working. I am reading your blog via an RSS agregator and had been using the XML link. Somehow the XML link did not show updates for quite a while! The Atom link works find.

Jan

Atle Smelvær said...

Hallvard:
I can see the usefulness of being able to remove the publicness of the TForm published fields and methods, but what "new persistence possibilities" are you thinking about?

Answer:
Many objects hide internal values and only want to show parts of them to the public or only public handler routines to work with other objects using the internal values and objects. If you want to save these using TReader and TWriter, you need to override "DefineProperties" of your TPersistent descendant and code all loading and saving. If they open up for publishing internal values and objects, it could save quite some coding in some cases and help clean up the object persistence more.

Thank you for opening the reports.

Another thing they should adjust for, is multiple references to the same objects. This could be adressed in TWriter/TReader with an internal hashlist and giving all objects an object ID when saved. I adjusted OmniXML's OmniXMLPersistent to make it handle this, and that worked very well. So I'm just waiting for these new RTTI features to expand it further.

.NET serialization handles multiple references, so it should be natural for Delphi Win32 to follow up on this.

I'll see if I get the time to create a QC report on this.

If you want to see the adjustments I made, I'll send it to you.

David Glassborow said...

Hi Hallvard, I've just posted a couple of blog entries on some of the new richer RTTI available in Delphi.

http://davidglassborow.blogspot.com/

I'd be interested in your comments.

Hallvard Vassbotn said...

Hi David,

That is very interesting! I recall looking at this stuff quickly several years ago (in the D7 era, when $METHODINFO was undocumented), but put it on the backburner and forgot to look into it in more detail.

A great blog you have there:

http://davidglassborow.blogspot.com/

Inspired by your article, I spent the tram-time yesterday looking Borland's implementation for Websnap of dynamically calling these methods.

I will blog more about your findings later. Keep up the good work!



Copyright © 2004-2007 by Hallvard Vassbotn