Sunday, May 28, 2006

David Glassborow on extended RTTI

It turns out the story (part one, two and three) about getting RTTI for published method parameters isn’t over yet :-).

In Delphi 7, Borland extended the RTTI capabilities in order to support SOAP and WebSnap by introducing the (then undocumented) $METHODINFO compiler directive. We’ll look at this in more detail later, but in the mean time make sure you go to David Glassborow’s blog to read his great posts on Interface RTTI and Class RTTI!

He has even written a DetailedRTTI unit that can return a string representation of a method’s signature – without any event property hacks (but only for classes compiled with $METHODINFO ON). Note that his unit has some interesting record helpers and class helpers (so you need Delphi 2006 to compile it) – read about David’s view on class helpers as a design tool here.

Updated (27. Oct 2007): $METHODINFO was first available in Delphi 7, not Delphi 6.

Saturday, May 20, 2006

Published fields details

In the previous article we looked at how published fields are used by the IDE and VCL to make component references easy to use and to find class references from component type strings. Now we want to dig deeper down into the implementation details of published fields. Starting with analyzing the assembly code in TObject.FieldAddress I was able to reconstruct these approximate Pascal structures:

type
TPublishedField = packed record
Offset: Integer;
Filler: word; // ??
Name: {packed} Shortstring; // really string[Length(Name)]
end;
PPft = ^TPft;
TPft = packed record
Count: Word;
Filler: LongWord; //??
Fields: array[0..High(Word)-1] of TPublishedField; // really [0..Count-1]
end;

PVmt = ^TVmt;
TVmt = packed record
// ...
FieldTable : PPft;
// ...
end;

The FieldTable field in the TVmt structure we’re reverse engineering is defined as a PPft, a pointer to a published field table. The Pft starts with a 2-byte count, and then there is four unknown bytes skipped by TObject.FieldAddress and then an array of variable length TPublisedField records. As in other RTTI structures the shortstring fields are packed so that they only take up the enough space to hold a length byte and the name string. The TPublishedField record contains an Offset into the object instance where the field can be found, 2 unknown bytes and the packed shortstring with the name of the field. We’ll figure out the meaning of these unknown fields shortly.

Luckily, the GetFieldClassTable routine in the implementation section of the Classes unit (which we discussed in the last article), documents clearly that the Filler field of the TPft record points a list of class references. With this information we can update our structures.

type
PClass = ^TClass;
PPublishedFieldTypes = ^TPublishedFieldTypes;
TPublishedFieldTypes = packed record
TypeCount: word;
Types: array[0..High(Word)-1] of PClass; // really [0..TypeCount-1]
end;
TPft = packed record
Count: Word;
FieldTypes: PPublishedFieldTypes;
Fields: TPublishedFields; // really [0..Count-1]
end;

Now we have identified the FieldTypes field that points to a record with a TypeCount and an array of class references. Note that the class references have an extra level of indirection. TClass references are already pointers, but the array actually contains pointers to TClass references. The reason for this is to support RTTI info and TClass VMTs that reside in different modules (packages). We see the same indirection by pointer in the TypInfo unit’s use of PTypeInfo pointers, in the implementation of global variables across units and the InstanceSize and Parent (class) fields of the TVmt. The Delphi package support code generated by the linker automatically fixes up these pointers after all static packages has been loaded by the application.

We still have the unknown filler field in the TPublishedField record. When I first started to write test code and dump this field from the RTIT of selected test classes, it looked like a sequential field index as the values started at 0 and increased steadily; 1, 2, 3. But when I added a second published TObject field, the next index was 0. Hmm. Combined with the seemingly missing link to the FieldTypes array I quickly realized that the unknown TPublishedField was an index into the type reference array.

This also confirmed that the FieldTypes array only contains unique class references. If you have 10 published TLabel fields, there will be only 1 TLabel reference in the FieldTypes array. For large forms with many components of the same type, this saves a little space in the TPublishedField record – each type index is only 2 bytes, while a direct TClass reference would take up 4 bytes. More importantly, the FieldTypes array can now be used to quickly translate from a component name string into a class reference, without wasting time scanning though duplicate class references. As we saw in the last article, this is just what the private TReader.FindComponentClass method does.

After digging through and figuring out the meaning of all the fields, we now have the following type declarations.

type
PClass = ^TClass;
PPublishedField = ^TPublishedField;
TPublishedField = packed record
Offset: Integer;
TypeIndex: word; // Index into the FieldTypes array below
Name: {packed} Shortstring; // really string[Length(Name)]
end;
PPublishedFieldTypes = ^TPublishedFieldTypes;
TPublishedFieldTypes = packed record
TypeCount: word;
Types: array[0..High(Word)-1] of PClass; // really [0..TypeCount-1]
end;
TPublishedFields = packed array[0..High(Word)-1] of TPublishedField;
PPft = ^TPft;
TPft = packed record
Count: Word;
FieldTypes: PPublishedFieldTypes;
Fields: TPublishedFields; // really [0..Count-1]
end;

Apart from the FieldTypes array and the TypeIndex field, this looks strikingly similar to the RTTI structures for published methods. To get a kick-start with writing the utility routines to search and iterate the field table structures I simply used the age-old copy-and-paste and search-and-replace trick.

function GetPft(AClass: TClass): PPft;
var
Vmt: PVmt;
begin
Vmt := GetVmt(AClass);
if Assigned(Vmt)
then Result := Vmt.FieldTable
else Result := nil;
end;

function GetPublishedFieldCount(AClass: TClass): integer;
var
Pft: PPft;
begin
Pft := GetPft(AClass);
if Assigned(Pft)
then Result := Pft.Count
else Result := 0;
end;

The cryptically named GetPft function returns a pointer to the published field table given a class reference. It uses the GetVmt function to get a pointer to the “magic” part of the virtual method table (VMT) and then simply return the value of the FieldTable field. The GetPublishedFieldCount function returns the number of published field in a given class reference (not including the fields of parent classes).

The routines to iterate the published fields of a class using both the index-based lookup and a GetFirst/GetNext based iterators also converted cleanly.

function GetNextPublishedField(AClass: TClass;
PublishedField: PPublishedField): PPublishedField;
begin
Result := PublishedField;
if Assigned(Result) then
Inc(PChar(Result), SizeOf(Result.Offset)
+ SizeOf(Result.TypeIndex)
+ SizeOf(Result.Name[0])
+ Length(Result.Name));
end;

function GetPublishedField(AClass: TClass;
TypeIndex: integer): PPublishedField;
var
Pft: PPft;
begin
Pft := GetPft(AClass);
if Assigned(Pft) and (TypeIndex < Pft.Count) then
begin
Result := @Pft.Fields[0];
while TypeIndex > 0 do
begin
Result := GetNextPublishedField(AClass, Result);
Dec(TypeIndex);
end;
end
else
Result := nil;
end;

function GetFirstPublishedField(AClass: TClass): PPublishedField;
begin
Result := GetPublishedField(AClass, 0);
end;

The only real difference here is that the TPublishedField record does not contain a field with the explicit size of the variable sized record (as the case is with TPublishedMethod). Instead we must use the size of the fixed fields plus the length of the name field to move the current pointer to the next record in the array. As before, caller is responsible for calling GetNextPublishedField the correct number of times (using GetPublishedFieldCount).

Then we have the searching routines that find a specific published field given different searching criteria, such as field name, field offset or field address. These use the iteration functions above. If successful they return a pointer to the relevant TPublishedField record inside the RTTI structures, otherwise they return nil.

function FindPublishedFieldByName(AClass: TClass; 
const AName: ShortString): PPublishedField;
var
i : integer;
begin
while Assigned(AClass) do
begin
Result := GetFirstPublishedField(AClass);
for i := 0 to GetPublishedFieldCount(AClass)-1 do
begin
// Note: Length(ShortString) expands to efficient inline code
if (Length(Result.Name) = Length(AName)) and
(StrLIComp(@Result.Name[1], @AName[1], Length(AName)) = 0) then
Exit;
Result := GetNextPublishedField(AClass, Result);
end;
AClass := AClass.ClassParent;
end;
Result := nil;
end;

function FindPublishedFieldByOffset(AClass: TClass;
AOffset: Integer): PPublishedField;
var
i : integer;
begin
while Assigned(AClass) do
begin
Result := GetFirstPublishedField(AClass);
for i := 0 to GetPublishedFieldCount(AClass)-1 do
begin
if Result.Offset = AOffset then
Exit;
Result := GetNextPublishedField(AClass, Result);
end;
AClass := AClass.ClassParent;
end;
Result := nil;
end;

function FindPublishedFieldByAddr(Instance: TObject;
AAddr: Pointer): PPublishedField;
begin
Result := FindPublishedFieldByOffset(Instance.ClassType,
PChar(AAddr) - PChar(Instance));
end;

Working directly with the TPublishedField pointers returned by the three functions above can be a little awkward, so I’ve also written a few wrapper routines that return simple values for the offset, address and name of a field in a given class or object reference.

function FindPublishedFieldOffset(AClass: TClass; 
const AName: ShortString): integer;
var
Field: PPublishedField;
begin
Field := FindPublishedFieldByName(AClass, AName);
if Assigned(Field)
then Result := Field.Offset
else Result := -1;
end;

function FindPublishedFieldAddr(Instance: TObject;
const AName: ShortString): PObject;
var
Offset: integer;
begin
Offset := FindPublishedFieldOffset(Instance.ClassType, AName);
if Offset >= 0
then Result := PObject(PChar(Instance) + Offset)
else Result := nil;
end;

function FindPublishedFieldName(AClass: TClass;
AOffset: integer): Shortstring; overload;
var
Field: PPublishedField;
begin
Field := FindPublishedFieldByOffset(AClass, AOffset);
if Assigned(Field)
then Result := Field.Name
else Result := '';
end;

function FindPublishedFieldName(Instance: TObject;
AAddr: Pointer): Shortstring; overload;
var
Field: PPublishedField;
begin
Field := FindPublishedFieldByAddr(Instance, AAddr);
if Assigned(Field)
then Result := Field.Name
else Result := '';
end;

Finally I wrote some routines to return the type, address and value of a published field, once you have a proper TPublishedField pointer in hand. These are useful when you are writing your own functions that iterate the published fields of a class.

function GetPublishedFieldType(AClass: TClass; Field: PPublishedField): TClass;
var
Pft: PPft;
begin
Pft := GetPft(AClass);
if Assigned(Pft) and Assigned(Field) and (Field.TypeIndex < Pft.FieldTypes.TypeCount)
then Result := Pft.FieldTypes.Types[Field.TypeIndex]^
else Result := nil;
end;

function GetPublishedFieldAddr(Instance: TObject; Field: PPublishedField): PObject;
begin
if Assigned(Field)
then Result := PObject(PChar(Instance) + Field.Offset)
else Result := nil;
end;

function GetPublishedFieldValue(Instance: TObject; Field: PPublishedField): TObject;
var
FieldAddr: PObject;
begin
FieldAddr := GetPublishedFieldAddr(Instance, Field);
if Assigned(FieldAddr)
then Result := FieldAddr^
else Result := nil;
end;

Phew! Lots of boring plumbing code there. With that under our wings we can write a reverse engineering function that dumps a reconstructed Pascal class declaration containing all the published fields of a class.

procedure DumpPublishedFields(AClass: TClass); overload;
var
i : integer;
Count: integer;
Field: PPublishedField;
FieldType: TClass;
ParentClass: string;
begin
while Assigned(AClass) do
begin
Count := GetPublishedFieldCount(AClass);
if Count > 0 then
begin
if AClass.ClassParent <> nil
then ParentClass := '('+AClass.ClassParent.ClassName+')'
else ParentClass := '';
writeln('type');
writeln(' ', AClass.ClassName, ' = class', ParentClass);
writeln(' published');
Field := GetFirstPublishedField(AClass);
for i := 0 to Count-1 do
begin
FieldType := GetPublishedFieldType(AClass, Field);
writeln(Format(' %s: %s; // Offs=%d, Index=%d',
[Field.Name, FieldType.ClassName, Field.Offset, Field.TypeIndex]));
Field := GetNextPublishedField(AClass, Field);
end;
writeln(' end;');
writeln;
end;
AClass := AClass.ClassParent;
end;
end;

Just for kicks I wrote a corresponding dumping routine for an object instance that also writes the current value for each field – it is more or less identical to the code above with the addition of a call to GetPublishedFieldValue to get the value of the field in the given instance. Then to test the code, I wrote this:

type
{$M+}
TMyClass = class
published
A: TObject;
LongName: TComponent;
B: TObject;
C: TList;
A2: TObject;
L2ongName: TComponent;
B2: TObject;
C2: TList;
end;

procedure Test;
begin
DumpPublishedFields(TMyClass);
end;

And the output is:

type
TMyClass = class(TObject)
published
A: TObject; // Offs=4, Index=0
LongName: TComponent; // Offs=8, Index=1
B: TObject; // Offs=12, Index=0
C: TList; // Offs=16, Index=2
A2: TObject; // Offs=20, Index=0
L2ongName: TComponent; // Offs=24, Index=1
B2: TObject; // Offs=28, Index=0
C2: TList; // Offs=32, Index=2
end;

Well, that was a lot of fun! :-)

Now we have documented three of the more interesting undocumented VMT fields that point to RTTI information generated by the compiler;

  TVmt = packed record
// ..
FieldTable : PPft;
MethodTable : PPmt;
DynamicTable : PDmt;
// ..
end;

There are still four fields we haven’t looked at yet;

  TVmt = packed record
// ..
IntfTable : Pointer;
AutoTable : Pointer;
InitTable : Pointer;
TypeInfo : Pointer;
// ..
end;

If time and interest permits, we might look at these in upcoming articles.

Acknowledgement. Note that Ray Lischner has documented most of these RTTI structures in his excellent Delphi in a Nutshell book. I'm digging and reverse engineering these structures independently, but it is fun to confirm my findings with what Ray wrote.

Sunday, May 14, 2006

Published fields

In our little series about reverse engineering the undocumented fields of the Delphi VMT, we have come to the FieldTable field. This field points to structures that describe the published fields of a class. In Delphi, published fields must be object references and are mainly used by forms and datamodules to store component references in logically named and easy to use fields (the alternative would be to use the Components property array with specific index values and casting).

The Delphi RTL only contains a single exposed method that accesses the field table, TObject.FieldAddress. This method returns the address of a published field given the field name.

function TObject.FieldAddress(const Name: ShortString): Pointer;
asm
// ...
MOV ESI,[EAX].vmtFieldTable
// ...
end;

This method is used by the component system, in the implementation of the private TComponent.SetReference method, to search the component’s owner for a field that matches the name of the component. If the owned component finds a correctly named published field in its owner, it will assign the field with the component’s reference or nil, depending on the component has just been is added to or removed from the owner.

procedure TComponent.SetReference(Enable: Boolean);
var
Field: ^TComponent;
begin
if FOwner <> nil then
begin
Field := FOwner.FieldAddress(FName);
if Field <> nil then
if Enable then Field^ := Self else Field^ := nil;
end;
end;

procedure TComponent.InsertComponent(AComponent: TComponent);
begin
// …
AComponent.SetReference(True);
// …
end;

procedure TComponent.RemoveComponent(AComponent: TComponent);
begin
// …
AComponent.SetReference(False);
// …
end;

procedure TComponent.SetName(const NewName: TComponentName);
begin
// …
SetReference(False);
ChangeName(NewName);
SetReference(True);
// …
end;

This is how the component fields in your form class automagically get their values. And if you free a component at runtime, the corresponding field is automagically cleared to nil. Pretty neat, huh?! :-).

We’ll look at the exact layout of the field table shortly, but to make FieldAddress work the field table must contain the name of each field and the offset into the object instances it resides. Note that the field table is part of the VMT and thus part of the class, not a specific object instance. This is why the field table cannot contain the actual address of the field, only the offset. The offset must be combined (added to) the object instance address to get the true address of the field at runtime.

There is another, private, routine that also accesses the field table. The Classes unit contains a BASM routine in the implementation section called GetFieldClassTable. This routine accesses a different part of the field table – one that contains class references.

function GetFieldClassTable(AClass: TClass): PFieldClassTable; 
asm
MOV EAX,[EAX].vmtFieldTable
// …
end;

This routine is part of some of the innermost private implementation details of the TReader streaming logic. The nested calls that end up in GetFieldClassTable starts with the public TReader.ReadComponent method and look like this:

TReader.ReadComponent
     CreateComponent (nested routine)
          FindComponentClass
               GetFieldClass
                    GetFieldClassTable
     FindExistingComponent (nested routine)
          FindComponentClass
               GetFieldClass
                    GetFieldClassTable

The FindExistingComponent logic handles visually inherited forms, datamodules and frames. CreateComponent creates a new component read from a DFM stream, given the string of with the component class name. FindComponentClass operates like a local mapping of class name strings to runtime TClass references. Instead of using the heavy-duty global routine GetClass that is used by Delphi at design-time, FindComponentClass first limits its search to the list of unique class types of declared published fields. You see, in addition to the name-offset association, the field table also contains a list of all the unique class types used for published fields in the owner class.

When you design a form, there is a little-known trick to remove the component fields of components you never reference from code. Alternatively, you can simply clear the name field of the component. This will make the component unnamed and the IDE will remove the field declaration for you. These tricks will slightly reduce the size of the DFM and slightly improve the form load performance at runtime.

You have to be careful when performing this trick, however. You must keep at least one published field of each component type on the form, otherwise it will not stream in from the DFM properly – giving you an error message like this:
---------------------------
Debugger Exception Notification
---------------------------
Project richedit.exe raised exception class EClassNotFound with message 'Class TLabel not found'. Process stopped. Use Step or Run to continue.
---------------------------
OK   Help  
---------------------------

Or outside the debugger:
---------------------------
Rich Edit Control Demo
---------------------------
Class TLabel not found.
---------------------------
OK  
---------------------------

You should now see the reason why you get this error. The TReader class uses the list of published field types to convert from a class name string to a proper TComponent class reference. If the class reference is not present in the form class’ field table RTTI, TReader is unable to create the component, and it resorts to raising the EClassNotFound exception you saw above. Note that TReader does fall back to the (potentially) slower GetClass mechanism if the component class reference isn’t found in the field table. This means that an alternative to keeping one published field of each component class, you can call RegisterClass on the component class in an initialization section.

//…
initialization
RegisterClass(TLabel);
end.

Then you don’t need any TLabel fields in the form class.

Ok, that should give you some background of why Delphi supports published fields, what kind of RTTI information is stored about them and how the VCL exploits them to perform its design time and DFM streaming magic. The assembly code in TObject.FieldAddress and GetFieldClassTable along with some helpful type declarations inside the Classes unit implementation section give us some helpful clues of how the field table RTTI structures are laid out in memory.

In the next blog post we’ll dive deeper down and write some Pascal data structures and utility methods to find and iterate the published fields and their types. Stay tuned!

Thursday, May 11, 2006

Hack #10: Getting the parameters of published methods

This hack is not normally very useful, but inspired by a comment on the first published methods article, I started to investigate how it could be done. Recall that the compiler currently does not encode the method signature when generating RTTI for published methods – only the code address and name string is stored.

So initially, it seems impossible to obtain this information. But let’s backtrack and think about how the IDE handles events and published methods at design-time. If you already have a number of event handlers defined – implemented in a number of published methods on the form – the Object Inspector will filter and show the assignment compatible methods in a drop down list of each component’s events. How does the IDE know what methods it should display in this list – and what methods to filter out?

Well, since each design-time component is compiled into a package and registered with the IDE, the IDE has full access to the RTTI of the component. As we will (probably) see in an upcoming article, each published event property (OnClick, OnSelected etc) is described by the compiler with RTTI that includes information about the parameters of the event. From this information the IDE know the number and types of parameters an assignment compatible method must have. It also uses the event’s RTTI to build a correct signature when you double click the event to declare and assign a new method to it.

But it still doesn’t have access to any parameter RTTI for the published methods of the form. In fact, it doesn’t have access to a compiled representation of the form at all. It does have a triumph-card up its sleeve, though; it has full access to the source code of the form. The IDE “simply” parses the form source code and finds methods that have the correct number and types of parameters. This parsing is not perfect and it will not always be able to evaluate alias type declarations, so typically the parameter types used must be a verbatim copy of the types used in the event type declaration.

That doesn’t help us very much – we don’t have access to the form’s source code at runtime. As the omnipresent Anonymous pointed out in his comment to the Published Methods article, there are Delphi decompilers that are able to determine the parameter types of published methods in a form declaration. How do they accomplish that? Well, there are two clues – the form typically contains a list of published fields; these are the component references that the streaming system automatically assigns as it loads a .DFM. These fields have RTTI that includes the class type of the component. In addition the form has a Components array property containing references to all the components and controls owned by the form. By using either of these, we get access to all the components associated with the form.

These components will typically have one or more event properties assigned to methods of the form. If these events were assigned at design time, the methods they point to will be published. All component event properties that can be assigned at design time must also be published. The compiler provides RTTI for such events – including information about the parameters of the event – and thus the parameters of the assignment compatible published method that is assigned to the event.

Things will be slightly more complicated for a static Delphi decompiler, but the basic chain of information that must be back-tracked is the same. Writing a decompiler is outside the scope of this article (it is left as an exercise for the reader <eg>), but let’s try to write some simple code that can figure out the parameters of all published methods that have been assigned to a published event of an owned component.

The basic algorithm would be something like this:

  • We take an Instance and a TStrings as parameters
  • Loop through all the published methods of the object
  • For each published method
  • Loop through all published events
  • Get the value of each event property
  • If the published method address equals the Code value of the event, we have a link
  • Return the parameter RTTI of the event type – this is the parameters also used for the published method
  • Repeat the above steps for each owned component
That sounds straight-forward enough. Let’s try to turn it into code.
procedure GetPublishedMethodsWithParameters(Instance: TObject; 
List: TStrings);
var
i : integer;
Method: PPublishedMethod;
AClass: TClass;
Count: integer;
begin
List.BeginUpdate;
try
List.Clear;
AClass := Instance.ClassType;
while Assigned(AClass) do
begin
Count := GetPublishedMethodCount(AClass);
if Count > 0 then
begin
List.Add(Format('Published methods in %s', [AClass.ClassName]));
Method := GetFirstPublishedMethod(AClass);
for i := 0 to Count-1 do
begin
List.Add(PublishedMethodToString(Instance, Method));
Method := GetNextPublishedMethod(AClass, Method);
end;
end;
AClass := AClass.ClassParent;
end;
finally
List.EndUpdate;
end;
end;

GetPublishedMethodsWithParameters is the top level method that uses the utility routines from the previous article to iterate through all published methods of the instance. It adds a string representation of each method to a TStrings list. The conversion from a published method to a string is delegated to the PublishedMethodToString function.

function PublishedMethodToString(Instance: TObject; 
Method: PPublishedMethod): string;
var
MethodSignature: TMethodSignature;
begin
if FindPublishedMethodSignature(Instance,
Method.Address, MethodSignature) then
Result := MethodSignatureToString(Method.Name, MethodSignature)
else
Result := Format('procedure %s(???);', [Method.Name]);
end;

This function first tries to obtain the signature of the method using FindPublishedMethodSignature and if it succeeds it translates the method signature into a string representation using MethodSignatureToString. We’ll look at these routines shortly, but let’s first look at the definition for the method signature record.

  PMethodParam = ^TMethodParam;
TMethodParam = record
Flags: TParamFlags;
ParamName: PShortString;
TypeName: PShortString;
end;
TMethodParamList = array of TMethodParam;
PMethodSignature = ^TMethodSignature;
TMethodSignature = record
MethodKind: TMethodKind;
ParamCount: Byte;
ParamList: TMethodParamList;
ResultType: PShortString;
end;

These definitions are my own structures to make it easier to access event type RTTI without struggling with variable length records due to packed shortstring fields. My records are a copy of and point to the raw RTTI structures generated by the compiler and exposed by the TypInfo unit. Here are the relevant definitions from TypInfo.

type
TMethodKind = (mkProcedure, mkFunction, mkConstructor,
mkDestructor, mkClassProcedure, mkClassFunction,
{ Obsolete }
mkSafeProcedure, mkSafeFunction);
TParamFlag = (pfVar, pfConst, pfArray, pfAddress, pfReference, pfOut);
TParamFlags = set of TParamFlag;
TTypeData = packed record
case TTypeKind of
/// ...
tkMethod: (
MethodKind: TMethodKind;
ParamCount: Byte;
ParamList: array[0..1023] of Char
{ParamList: array[1..ParamCount] of
record
Flags: TParamFlags;
ParamName: ShortString;
TypeName: ShortString;
end;
ResultType: ShortString});
end;

Ok. The TTypeData record encodes an event type (a method pointer property) in the following way. The MethodKind field indicates what kind of method this is – AFAICT only two values are currently used – mkProcedure and mkFunction – corresponding to procedure … of object and function … of object declarations, respectively. Then there is a byte containing the number of parameters the method has – limiting the number of parameters in an event type to 255 :-). Then there is a packed array of packed records with information about each parameter; parameter kind (var, const, out, array of), parameter name and type. Following all the parameters is a string with the name of the type that the method returns, if MethodKind was mkFunction.

Since the ParamName, TypeName and ResultType are all encoded as packed shortstrings that is very awkward to deal with, I declared the TMethodParam and TMethodSignature records above. Here is the GetMethodSignature function that converts from a PPropInfo of an event to the easier-to-use TMethodSignature record.

function PackedShortString(Value: PShortstring; 
var NextField{: Pointer}): PShortString; overload;
begin
Result := Value;
PShortString(NextField) := Value;
Inc(PChar(NextField), SizeOf(Result^[0]) + Length(Result^));
end;

function PackedShortString(var NextField{: Pointer}): PShortString; overload;
begin
Result := PShortString(NextField);
Inc(PChar(NextField), SizeOf(Result^[0]) + Length(Result^));
end;

function GetMethodSignature(Event: PPropInfo): TMethodSignature;
type
PParamListRecord = ^TParamListRecord;
TParamListRecord = packed record
Flags: TParamFlags;
ParamName: {packed} ShortString; // Really string[Length(ParamName)]
TypeName: {packed} ShortString; // Really string[Length(TypeName)]
end;
var
EventData: PTypeData;
i: integer;
MethodParam: PMethodParam;
ParamListRecord: PParamListRecord;
begin
Assert(Assigned(Event) and Assigned(Event.PropType));
Assert(Event.PropType^.Kind = tkMethod);
EventData := GetTypeData(Event.PropType^);
Result.MethodKind := EventData.MethodKind;
Result.ParamCount := EventData.ParamCount;
SetLength(Result.ParamList, Result.ParamCount);
ParamListRecord := @EventData.ParamList;
for i := 0 to Result.ParamCount-1 do
begin
MethodParam := @Result.ParamList[i];
MethodParam.Flags := ParamListRecord.Flags;
MethodParam.ParamName := PackedShortString(
@ParamListRecord.ParamName, ParamListRecord);
MethodParam.TypeName := PackedShortString(ParamListRecord);
end;
Result.ResultType := PackedShortString(ParamListRecord);
end;

It uses a couple of overloaded helper routines to get at the packed shortstrings and to advance the current record pointer accordingly. I also had to re-declare the packed TParamListRecord, as the version in TypInfo is commented out. We’ll probably scrutinize the PPropInfo structures later – in this context it suffices to say that we are able to get at the interesting information about the event type’s method signature, and return it in an edible and useful format.

Right, now we have two disconnected pieces of code – we have code that loops through all published methods, trying to convert them into describing strings – and we have code to get at the method signature of an event property. Now we have to connect the two pieces of logic to perform something “useful”. There are two missing links; finding a event property that points to a given published method – and converting a method signature record into a readable string.

Looking at the high-level algorithm we defined above, we need to loop through all published events. Here is some code for that:

function FindEventProperty(Instance: TObject; Code: Pointer): PPropInfo;
var
Count: integer;
PropList: PPropList;
i: integer;
Method: TMethod;
begin
Assert(Assigned(Instance));
Count := GetPropList(Instance, PropList);
if Count > 0 then
try
for i := 0 to Count-1 do
begin
Result := PropList^[i];
if Result.PropType^.Kind = tkMethod then
begin
Method := GetMethodProp(Instance, Result);
if Method.Code = Code then
Exit;
end;
end;
finally
FreeMem(PropList);
end;
Result := nil;
end;

This will get a list of all published properties, filtering out the event properties (tkMethod), getting the current event value and checking if it points to a specific code address. If it does, we return the PPropInfo of the event property, otherwise we return nil. This code will only check a single instance, but we need to check all owned components (if the instance happens to be a TComponent) – so let’s write a routine to do that recursively.

function FindEventFor(Instance: TObject; Code: Pointer): PPropInfo;
var
i: integer;
Component: TComponent;
begin
Result := FindEventProperty(Instance, Code);
if Assigned(Result) then Exit;
if Instance is TComponent then
begin
Component := TComponent(Instance);
for i:= 0 to Component.ComponentCount-1 do
begin
Result := FindEventFor(Component.Components[i], Code);
if Assigned(Result) then Exit;
end;
end;
Result := nil;
// TODO: Check published fields system
end;

This function tries to find an event property that is assigned to a specific code address. It searches in this instance then in all its owned components (if the instance is a component)

Here we use the Components array that all components and controls have to check if any of those might have an event property that points to the specific code address of interest. As the comment indicates we could also (or instead) have checked the instances referenced by any published fields. Since the RTL does not have any easy to use routines to iterate through all published fields and we haven’t got that far in our VMT-digging series yet, I’ve skipped this for now. Besides, the published field references and the Components array references are (mostly) duplicates of each other.

Now we have enough plumbing code to write the final link between the published methods loop and the event searching logic. Here is the FindPublishedMethodSignature function that PublishedMethodToString calls above.

function FindPublishedMethodSignature(Instance: TObject; Code: Pointer; 
var MethodSignature: TMethodSignature): boolean;
var
Event: PPropInfo;
begin
Assert(Assigned(Code));
Event := FindEventFor(Instance, Code);
Result := Assigned(Event);
if Result then
MethodSignature := GetMethodSignature(Event);
end;

This routine first uses the recursive FindEventFor to try and find an event’s PPropInfo that describes the method and if it finds one, it converts the hard-to-use PPropInfo to an easy-to-use TMethodSignature. Finally we only have to write some boilerplate code to convert the binary TMethodSignature record into a human readable string.

function MethodKindString(MethodKind: TMethodKind): string;
begin
case MethodKind of
mkSafeProcedure,
mkProcedure : Result := 'procedure';
mkSafeFunction,
mkFunction : Result := 'function';
mkConstructor : Result := 'constructor';
mkDestructor : Result := 'destructor';
mkClassProcedure: Result := 'class procedure';
mkClassFunction : Result := 'class function';
end;
end;

function MethodParamString(const MethodParam: TMethodParam;
ExcoticFlags: boolean = False): string;
begin
if pfVar in MethodParam.Flags then Result := 'var '
else if pfConst in MethodParam.Flags then Result := 'const '
else if pfOut in MethodParam.Flags then Result := 'out '
else Result := '';
if ExcoticFlags then
begin
if pfAddress in MethodParam.Flags then Result := '{addr} ' + Result;
if pfReference in MethodParam.Flags then Result := '{ref} ' + Result;
end;
Result := Result + MethodParam.ParamName^ + ': ';
if pfArray in MethodParam.Flags then
Result := Result + 'array of ';
Result := Result + MethodParam.TypeName^;
end;

function MethodParametesString(const MethodSignature:
TMethodSignature): string;
var
i: integer;
MethodParam: PMethodParam;
begin
Result := '';
for i := 0 to MethodSignature.ParamCount-1 do
begin
MethodParam := @MethodSignature.ParamList[i];
Result := Result + MethodParamString(MethodParam^);
if i < MethodSignature.ParamCount-1 then
Result := Result + '; ';
end;
end;

function MethodSignatureToString(const Name: string;
const MethodSignature: TMethodSignature): string;
begin
Result := Format('%s %s(%s)',
[MethodKindString(MethodSignature.MethodKind),
Name,
MethodParametesString(MethodSignature)]);
if Length(MethodSignature.ResultType^) > 0 then
Result := Result + ': ' + MethodSignature.ResultType^;
Result := Result + ';';
end;

Phew! This article is getting long and with a lot of code! But now we have some serious (but pretty useless) reverse engineering code to dig out the parameters of a published method. Note that this only works if the instance (or one of its components) also has a published property that points to the published method. The good news is that this is the case for most existing published methods – such as the event handlers on a TForm instance. The bad news is that this would not be the case for any published methods we would like to call dynamically at runtime (and thus would not be assigned to any events).

If you’re still hanging in there, we can now write some test code to see if this thing works or not.

type
{$M+}
TMyClass = class;
TOnFour = function (A: array of byte; const B: array of byte;
var C: array of byte; out D: array of byte): TComponent of object;
TOnFive = procedure (Component1: TComponent;
var Component2: TComponent;
out Component3: TComponent;
const Component4: TComponent) of object;
TOnSix = function (const A: string; var Two: integer;
out Three: TMyClass; Four: PInteger; Five: array of Byte;
Six: integer): string of object;
TMyClass = class
private
FOnFour: TOnFour;
FOnFive: TOnFive;
FOnSix: TOnSix;
published
function FourthPublished(A: array of byte; const B: array of byte;
var C: array of byte; out D: array of byte): TComponent;
procedure FifthPublished(Component1: TComponent;
var Component2: TComponent;
out Component3: TComponent;
const Component4: TComponent);
function SixthPublished(const A: string; var Two: integer;
out Three: TMyClass; Four: PInteger;
Five: array of Byte; Six: integer): string;
property OnFour: TOnFour read FOnFour write FOnFour;
property OnFive: TOnFive read FOnFive write FOnFive;
property OnSix: TOnSix read FOnSix write FOnSix;
end;

function TMyClass.FourthPublished;
begin
Result := nil;
end;
procedure TMyClass.FifthPublished;
begin
end;
function TMyClass.SixthPublished;
begin
end;

procedure DumpPublishedMethodsParameters(Instance: TObject);
var
i : integer;
List: TStringList;
begin
List := TStringList.Create;
try
GetPublishedMethodsWithParameters(Instance, List);
for i := 0 to List.Count-1 do
writeln(List[i]);
finally
List.Free;
end;
end;

procedure Test;
var
MyClass: TMyClass;
begin
MyClass := TMyClass.Create;
MyClass.OnFour := MyClass.FourthPublished;
MyClass.OnFive := MyClass.FifthPublished;
MyClass.OnSix := MyClass.SixthPublished;
DumpPublishedMethodsParameters(MyClass);
end;

begin
Test;
readln;
end.

When we run this we get:
Published methods in TMyClass

function FourthPublished(A: array of Byte; const B: array of Byte; var C: array of Byte; out D: array of Byte): TComponent;
procedure FifthPublished(Component1: TComponent; var Component2: TComponent; out Component3: TComponent; const Component4: TComponent);
function SixthPublished(const A: String; var Two: Integer; out Three: TMyClass; Four: PInteger; Five: array of Byte; Six: Integer): String;

Looks pretty accurate to me! The test code above is a little contrived – an instance would not assign its event properties to its own methods. A more realistic test case would be a form with numerous published events hooked up at design time. I loaded up the \Demos\RichEdit\RichEdit.dpr project shipped with Delphi 7 (in Delphi 2006 the path is \Demos\DelphiWin32\VCLWin32\RichEdit\RichEdit.bdsproj). On the main form in the remain.pas unit, I added my HVPublishedMethodParams unit to the uses clause and changed the Help | About event handler like this:

procedure TMainForm.HelpAbout(Sender: TObject);
begin
GetPublishedMethodsWithParameters(Self, Editor.Lines);
{ with TAboutBox.Create(Self) do
try
ShowModal;
finally
Free;
end;}
end;

This will dump all published methods of the form to the edit control – trying to match them up to events with RTTI to find parameter information. When I ran the demo app and selected Help | About, the edit control was filled with this text:
Published methods in TMainForm

procedure SelectionChange(Sender: TObject);
procedure FormCreate(Sender: TObject);
procedure ShowHint(???);
procedure FileNew(Sender: TObject);
procedure FileOpen(Sender: TObject);
procedure FileSave(Sender: TObject);
procedure FileSaveAs(Sender: TObject);
procedure FilePrint(Sender: TObject);
procedure FileExit(Sender: TObject);
procedure EditUndo(Sender: TObject);
procedure EditCut(Sender: TObject);
procedure EditCopy(Sender: TObject);
procedure EditPaste(Sender: TObject);
procedure HelpAbout(Sender: TObject);
procedure SelectFont(Sender: TObject);
procedure RulerResize(Sender: TObject);
procedure FormResize(Sender: TObject);
procedure FormPaint(Sender: TObject);
procedure BoldButtonClick(Sender: TObject);
procedure ItalicButtonClick(Sender: TObject);
procedure FontSizeChange(Sender: TObject);
procedure AlignButtonClick(Sender: TObject);
procedure FontNameChange(Sender: TObject);
procedure UnderlineButtonClick(Sender: TObject);
procedure BulletsButtonClick(Sender: TObject);
procedure FormCloseQuery(Sender: TObject; var CanClose: Boolean);
procedure RulerItemMouseDown(Sender: TObject; Button: TMouseButton;
Shift: TShiftState; X: Integer; Y: Integer);
procedure RulerItemMouseMove(Sender: TObject; Shift: TShiftState;
X: Integer; Y: Integer);
procedure FirstIndMouseUp(Sender: TObject; Button: TMouseButton;
Shift: TShiftState; X: Integer; Y: Integer);
procedure LeftIndMouseUp(Sender: TObject; Button: TMouseButton;
Shift: TShiftState; X: Integer; Y: Integer);
procedure RightIndMouseUp(Sender: TObject; Button: TMouseButton;
Shift: TShiftState; X: Integer; Y: Integer);
procedure FormShow(Sender: TObject);
procedure RichEditChange(Sender: TObject);
procedure SwitchLanguage(Sender: TObject);
procedure ActionList2Update(Action: TBasicAction; var Handled: Boolean);

This is pretty much a verbatim copy of the published methods in the interface section of the form unit. We are only missing parameters for the ShowHint method. This is because this published method does not have any design-time event properties pointing to it. Instead it is assigned at runtime to one of Application’s events.

    procedure ShowHint(Sender: TObject);
///…
procedure TMainForm.FormCreate(Sender: TObject);
begin
Application.OnHint := ShowHint;
//…
end;

The TApplication object does not publish any of its properties, so there is no straightforward way of obtaining the ShowHint parameters. In fact, having the ShowHint method as published is a minor flaw – it should be made private instead.

That concludes this intriguing, but AFAICS, useless hack. Now you should have a better understanding of how published methods and published events are interconnected at runtime and how Delphi decompilers can perform some of their magic. We have also illustrated just how much information about your program is stored in the EXE file – you better make sure you don’t include any sensitive information in your published method names or event parameter names and types :-).

Hope you have enjoyed the ride!

Come get a free sample chapter!

Jon Shemitz has made a chapter called Strings and Files from his upcoming .NET 2.0 for Delphi Programmers book available for download (I was the tech editor of the book). Click this link to go to his book site and download the chapter. You can also take a look at the front matter (including a most generous acknowledgement – thanks Jon!) and the impressive index.

When you have read the great chapter and see what a valuable book this is (read my review here), go and order it!

Tuesday, May 02, 2006

Under the hood of published methods

Now that we have covered what published methods are, how the IDE and VCL uses them in .DFM streaming and how to use them polymorphically, we are ready to dive deeper to see how they are implemented under the hood.

If you have been following this series of articles about the polymorphic features of the Delphi language, you will have noticed that the VMT contains a MethodTable field, currently defined as an untyped pointer. By scrutinizing the TObject methods that access this table, MethodName and MethodAddress, I’ve been able to write approximate Pascal declarations to describe the structure of the MethodTable.

type
PPublishedMethod = ^TPublishedMethod;
TPublishedMethod = packed record
Size: word;
Address: Pointer;
Name: {packed} Shortstring; // really string[Length(Name)]
end;
TPublishedMethods = packed array[0..High(Word)-1] of TPublishedMethod;
PPmt = ^TPmt;
TPmt = packed record
Count: Word;
Methods: TPublishedMethods; // really [0..Count-1]
end;

PVmt = ^TVmt;
TVmt = packed record
// …
MethodTable : PPmt;
// …
end;

As you can see above the published method table now has the type PPmt. It points to a record that contains the number of published methods in this class followed by a packed array of TPublishedMethod records. Each record contains a size (used to find the start of the next record), a pointer to the address of the method and a packed shortstring containing the name of the method.

Notice that it appears that the Size field would have been unnecessary. In all my testing the value of Size has always been equal to the expression:

  Size :=  SizeOf(Size) + SizeOf(Address) + SizeOf(Name[0]) + Length(Name);

In other words, the next TPublishedMethod record starts just after the last byte of the method name. I’m not sure why Borland decided to add the Size field, but one possible reason might be to be able to extend the contents of the TPublishedMethod record in the future. One natural extension would be to include information about the parameters and calling convention of the method. Then Size would be adjusted accordingly and old code unaware of the new fields would still work fine (see the sidebar Extra published method data below).

Now that we have a few data structures to work with we can start writing some utility routines.

function GetVmt(AClass: TClass): PVmt;
begin
Result := PVmt(AClass);
Dec(Result);
end;

function GetPmt(AClass: TClass): PPmt;
var
Vmt: PVmt;
begin
Vmt := GetVmt(AClass);
if Assigned(Vmt)
then Result := Vmt.MethodTable
else Result := nil;
end;

function GetPublishedMethodCount(AClass: TClass): integer;
var
Pmt: PPmt;
begin
Pmt := GetPmt(AClass);
if Assigned(Pmt)
then Result := Pmt.Count
else Result := 0;
end;

function GetPublishedMethod(AClass: TClass; Index: integer): PPublishedMethod;
var
Pmt: PPmt;
begin
Pmt := GetPmt(AClass);
if Assigned(Pmt) and (Index < Pmt.Count) then
begin
Result := @Pmt.Methods[0];
while Index > 0 do
begin
Inc(PChar(Result), Result.Size);
Dec(Index);
end;
end
else
Result := nil;
end;

First we have our old friend GetVmt to get a pointer to the magic part of the VMT given a class reference. Using this and the new PPmt type we can write the GetPmt function above – this returns a pointer to the class’ published method table. Then there are two methods that return the number of published methods and a specific published method, given an index from 0 to Count-1. Using these utility routine we can now write some test code to dump all the published methods of a class (and its parent classes).

procedure DumpPublishedMethods(AClass: TClass);
var
i : integer;
Method: PPublishedMethod;
begin
while Assigned(AClass) do
begin
writeln('Published methods in ', AClass.ClassName);
for i := 0 to GetPublishedMethodCount(AClass)-1 do
begin
Method := GetPublishedMethod(AClass, i);
writeln(Format('%d. MethodAddr = %p, Name = %s',
[i, Method.Address, Method.Name]));
end;
AClass := AClass.ClassParent;
end;
end;
This dumping works fine, but it has less than ideal performance complexity. The GetPublished method has iterate to the Index’th method for each call, giving the Dump routine quadric or O(n^2) performance complexity (where n is the number of published methods in the class). Now, most classes does not have that many published methods and the work done inside the inner loop is minimal, so in practice you should never experience this as a problem.

However, my performance obsession makes me want to speed this up, at least theoretically. The packed array of TPublishedMethod records can be seen as a primitive singly linked list – random access is slow – so an iterator-based technique should improve performance. Let’s write two more utility routines.
function GetFirstPublishedMethod(AClass: TClass): PPublishedMethod;
begin
Result := GetPublishedMethod(AClass, 0);
end;

function GetNextPublishedMethod(AClass: TClass; PublishedMethod:
PPublishedMethod): PPublishedMethod;
begin
Result := PublishedMethod;
if Assigned(Result) then
Inc(PChar(Result), Result.Size);
end;

These two routines constitute a typical GetFirst/GetNext pair of iterators. The first method returns a reference to the first published method while the second method returns a reference to the next published method. Notice that it is the responsibility of the caller to call GetNextPublishedMethod the correct number of times (by using GetPublishedMethodCount). Now we can rewrite the dumping method, making it slightly faster.

procedure DumpPublishedMethodsFaster(AClass: TClass);
var
i : integer;
Method: PPublishedMethod;
begin
while Assigned(AClass) do
begin
writeln('Published methods in ', AClass.ClassName);
Method := GetFirstPublishedMethod(AClass);
for i := 0 to GetPublishedMethodCount(AClass)-1 do
begin
writeln(Format('%d. MethodAddr = %p, Name = %s',
[i, Method.Address, Method.Name]));
Method := GetNextPublishedMethod(AClass, Method);
end;
AClass := AClass.ClassParent;
end;
end;

Iterating over or dumping all published methods in a class is not normally very useful. TObject already contains methods to perform published method lookups using MethodAddress and MethodName. These are written in efficient assembly, but that also makes them hard to read. I used them to determine the format of the published method table data structures above. Here are some corresponding routines in Pascal.

function FindPublishedMethodByName(AClass: TClass; 
const AName: ShortString): PPublishedMethod;
var
i : integer;
begin
while Assigned(AClass) do
begin
Result := GetFirstPublishedMethod(AClass);
for i := 0 to GetPublishedMethodCount(AClass)-1 do
begin
// Note: Length(ShortString) expands to efficient inline code
if (Length(Result.Name) = Length(AName)) and
(StrLIComp(@Result.Name[1], @AName[1], Length(AName)) = 0) then
Exit;
Result := GetNextPublishedMethod(AClass, Result);
end;
AClass := AClass.ClassParent;
end;
Result := nil;
end;

function FindPublishedMethodByAddr(AClass: TClass;
AAddr: Pointer): PPublishedMethod;
var
i : integer;
begin
while Assigned(AClass) do
begin
Result := GetFirstPublishedMethod(AClass);
for i := 0 to GetPublishedMethodCount(AClass)-1 do
begin
if Result.Address = AAddr then
Exit;
Result := GetNextPublishedMethod(AClass, Result);
end;
AClass := AClass.ClassParent;
end;
Result := nil;
end;

function FindPublishedMethodAddr(AClass: TClass; const
AName: ShortString): Pointer;
var
Method: PPublishedMethod;
begin
Method := FindPublishedMethodByName(AClass, AName);
if Assigned(Method)
then Result := Method.Address
else Result := nil;
end;

function FindPublishedMethodName(AClass: TClass;
AAddr: Pointer): Shortstring;
var
Method: PPublishedMethod;
begin
Method := FindPublishedMethodByAddr(AClass, AAddr);
if Assigned(Method)
then Result := Method.Name
else Result := '';
end;

The two first functions find a published method by name or address and return a pointer to the TPublishedMethod record that describes the method. Having direct access to this record deep inside the RTTI data structures can come in handy later. The two last functions return the string or address directly and correspond directly to MethodAddress and MethodName.

Finally we can write a class and some code to test the routines we have written.

type
{$M+}
TMyClass = class
published
procedure FirstPublished;
procedure SecondPublished(A: integer);
procedure ThirdPublished(A: integer); stdcall;
function FourthPublished(A: TComponent): TComponent; stdcall;
procedure FifthPublished(Component: TComponent); stdcall;
function SixthPublished(A: string; Two, Three, Four,
Five, Six: integer): string; pascal;
end;

procedure TMyClass.FirstPublished;
begin
end;
procedure TMyClass.SecondPublished;
begin
end;
procedure TMyClass.ThirdPublished;
begin
end;
function TMyClass.FourthPublished;
begin
Result := nil;
end;
procedure TMyClass.FifthPublished;
begin
end;
function TMyClass.SixthPublished;
begin
end;

procedure DumpMethod(Method: PPublishedMethod);
begin
if Assigned(Method)
then Writeln(Format('%p=%s', [Method.Address, Method.Name]))
else Writeln('nil');
end;

procedure Test;
begin
DumpPublishedMethods(TMyClass);
DumpPublishedMethodsFaster(TMyClass);
DumpMethod(FindPublishedMethodByName(TMyClass, 'FirstPublished'));
DumpMethod(FindPublishedMethodByName(TMyClass,
FindPublishedMethodName(TMyClass, @TMyClass.SecondPublished)));
DumpMethod(FindPublishedMethodByAddr(TMyClass, @TMyClass.ThirdPublished));
DumpMethod(FindPublishedMethodByAddr(TMyClass,
FindPublishedMethodAddr(TMyClass, 'FourthPublished')));
DumpMethod(FindPublishedMethodByAddr(TMyClass,
FindPublishedMethodByName(TMyClass, 'FifthPublished').Address));
DumpMethod(FindPublishedMethodByAddr(TMyClass, @TMyClass.SixthPublished));
DumpMethod(FindPublishedMethodByName(TMyClass, 'NotThere'));
DumpMethod(FindPublishedMethodByAddr(TMyClass, nil));
end;

begin
Test;
readln;
end.

The output from this little test snippet is:

Published methods in TMyClass
0. MethodAddr = 00412BCC, Name = FirstPublished
1. MethodAddr = 00412BD0, Name = SecondPublished
2. MethodAddr = 00412BD4, Name = ThirdPublished
3. MethodAddr = 00412BDC, Name = FourthPublished
4. MethodAddr = 00412BE8, Name = FifthPublished
5. MethodAddr = 00412BF0, Name = SixthPublished
Published methods in TObject
Published methods in TMyClass
0. MethodAddr = 00412BCC, Name = FirstPublished
1. MethodAddr = 00412BD0, Name = SecondPublished
2. MethodAddr = 00412BD4, Name = ThirdPublished
3. MethodAddr = 00412BDC, Name = FourthPublished
4. MethodAddr = 00412BE8, Name = FifthPublished
5. MethodAddr = 00412BF0, Name = SixthPublished
Published methods in TObject
00412BCC=FirstPublished
00412BD0=SecondPublished
00412BD4=ThirdPublished
00412BDC=FourthPublished
00412BE8=FifthPublished
00412BF0=SixthPublished
nil
nil

Detecting extra published method data
I’ve added some DEBUG code to GetNextPublishedMethod that tries to detect and raise an exception of it encounters a TPublishedMethod record where the Size field indicates that the record contains additional data after the packed Name string.

function GetNextPublishedMethod(AClass: TClass; 
PublishedMethod: PPublishedMethod): PPublishedMethod;
{$IFDEF DEBUG}
var
ExpectedSize: integer;
{$ENDIF}
begin
Result := PublishedMethod;
{$IFDEF DEBUG}
ExpectedSize := SizeOf(Result.Size)
+ SizeOf(Result.Address)
+ SizeOf(Result.Name[0])
+ Length(Result.Name);
if Result.Size <> ExpectedSize then
raise Exception.CreateFmt(
'RTTI for the published method "%s" of class "%s"
has %d extra bytes of unknown data!'
,
[Result.Name, AClass.ClassName, Result.Size-ExpectedSize]);
{$ENDIF}
if Assigned(Result) then
Inc(PChar(Result), Result.Size);
end;

During my testing with published methods of different calling conventions and number of parameters, this exception has never occurred. Let me know if you find otherwise.

I recalled faintly that Ray Lischner wrote about these extra fields in his excellent Delphi in a Nutshell. I was actually one of the technical editors of that book – so I should remember :-). As Ray documented (see page 74), Delphi 5 (and earlier versions) would encode the parameters of some published methods – more specifically stdcall methods with RTTI enabled parameter and return types. This half-hearted parameter encoding was probably the remains of some experimental RTTI generation code in the compiler that seems to have been removed from Delphi 7 and 2006.