Saturday, May 20, 2006

Published fields details

In the previous article we looked at how published fields are used by the IDE and VCL to make component references easy to use and to find class references from component type strings. Now we want to dig deeper down into the implementation details of published fields. Starting with analyzing the assembly code in TObject.FieldAddress I was able to reconstruct these approximate Pascal structures:

type
TPublishedField = packed record
Offset: Integer;
Filler: word; // ??
Name: {packed} Shortstring; // really string[Length(Name)]
end;
PPft = ^TPft;
TPft = packed record
Count: Word;
Filler: LongWord; //??
Fields: array[0..High(Word)-1] of TPublishedField; // really [0..Count-1]
end;

PVmt = ^TVmt;
TVmt = packed record
// ...
FieldTable : PPft;
// ...
end;

The FieldTable field in the TVmt structure we’re reverse engineering is defined as a PPft, a pointer to a published field table. The Pft starts with a 2-byte count, and then there is four unknown bytes skipped by TObject.FieldAddress and then an array of variable length TPublisedField records. As in other RTTI structures the shortstring fields are packed so that they only take up the enough space to hold a length byte and the name string. The TPublishedField record contains an Offset into the object instance where the field can be found, 2 unknown bytes and the packed shortstring with the name of the field. We’ll figure out the meaning of these unknown fields shortly.

Luckily, the GetFieldClassTable routine in the implementation section of the Classes unit (which we discussed in the last article), documents clearly that the Filler field of the TPft record points a list of class references. With this information we can update our structures.

type
PClass = ^TClass;
PPublishedFieldTypes = ^TPublishedFieldTypes;
TPublishedFieldTypes = packed record
TypeCount: word;
Types: array[0..High(Word)-1] of PClass; // really [0..TypeCount-1]
end;
TPft = packed record
Count: Word;
FieldTypes: PPublishedFieldTypes;
Fields: TPublishedFields; // really [0..Count-1]
end;

Now we have identified the FieldTypes field that points to a record with a TypeCount and an array of class references. Note that the class references have an extra level of indirection. TClass references are already pointers, but the array actually contains pointers to TClass references. The reason for this is to support RTTI info and TClass VMTs that reside in different modules (packages). We see the same indirection by pointer in the TypInfo unit’s use of PTypeInfo pointers, in the implementation of global variables across units and the InstanceSize and Parent (class) fields of the TVmt. The Delphi package support code generated by the linker automatically fixes up these pointers after all static packages has been loaded by the application.

We still have the unknown filler field in the TPublishedField record. When I first started to write test code and dump this field from the RTIT of selected test classes, it looked like a sequential field index as the values started at 0 and increased steadily; 1, 2, 3. But when I added a second published TObject field, the next index was 0. Hmm. Combined with the seemingly missing link to the FieldTypes array I quickly realized that the unknown TPublishedField was an index into the type reference array.

This also confirmed that the FieldTypes array only contains unique class references. If you have 10 published TLabel fields, there will be only 1 TLabel reference in the FieldTypes array. For large forms with many components of the same type, this saves a little space in the TPublishedField record – each type index is only 2 bytes, while a direct TClass reference would take up 4 bytes. More importantly, the FieldTypes array can now be used to quickly translate from a component name string into a class reference, without wasting time scanning though duplicate class references. As we saw in the last article, this is just what the private TReader.FindComponentClass method does.

After digging through and figuring out the meaning of all the fields, we now have the following type declarations.

type
PClass = ^TClass;
PPublishedField = ^TPublishedField;
TPublishedField = packed record
Offset: Integer;
TypeIndex: word; // Index into the FieldTypes array below
Name: {packed} Shortstring; // really string[Length(Name)]
end;
PPublishedFieldTypes = ^TPublishedFieldTypes;
TPublishedFieldTypes = packed record
TypeCount: word;
Types: array[0..High(Word)-1] of PClass; // really [0..TypeCount-1]
end;
TPublishedFields = packed array[0..High(Word)-1] of TPublishedField;
PPft = ^TPft;
TPft = packed record
Count: Word;
FieldTypes: PPublishedFieldTypes;
Fields: TPublishedFields; // really [0..Count-1]
end;

Apart from the FieldTypes array and the TypeIndex field, this looks strikingly similar to the RTTI structures for published methods. To get a kick-start with writing the utility routines to search and iterate the field table structures I simply used the age-old copy-and-paste and search-and-replace trick.

function GetPft(AClass: TClass): PPft;
var
Vmt: PVmt;
begin
Vmt := GetVmt(AClass);
if Assigned(Vmt)
then Result := Vmt.FieldTable
else Result := nil;
end;

function GetPublishedFieldCount(AClass: TClass): integer;
var
Pft: PPft;
begin
Pft := GetPft(AClass);
if Assigned(Pft)
then Result := Pft.Count
else Result := 0;
end;

The cryptically named GetPft function returns a pointer to the published field table given a class reference. It uses the GetVmt function to get a pointer to the “magic” part of the virtual method table (VMT) and then simply return the value of the FieldTable field. The GetPublishedFieldCount function returns the number of published field in a given class reference (not including the fields of parent classes).

The routines to iterate the published fields of a class using both the index-based lookup and a GetFirst/GetNext based iterators also converted cleanly.

function GetNextPublishedField(AClass: TClass;
PublishedField: PPublishedField): PPublishedField;
begin
Result := PublishedField;
if Assigned(Result) then
Inc(PChar(Result), SizeOf(Result.Offset)
+ SizeOf(Result.TypeIndex)
+ SizeOf(Result.Name[0])
+ Length(Result.Name));
end;

function GetPublishedField(AClass: TClass;
TypeIndex: integer): PPublishedField;
var
Pft: PPft;
begin
Pft := GetPft(AClass);
if Assigned(Pft) and (TypeIndex < Pft.Count) then
begin
Result := @Pft.Fields[0];
while TypeIndex > 0 do
begin
Result := GetNextPublishedField(AClass, Result);
Dec(TypeIndex);
end;
end
else
Result := nil;
end;

function GetFirstPublishedField(AClass: TClass): PPublishedField;
begin
Result := GetPublishedField(AClass, 0);
end;

The only real difference here is that the TPublishedField record does not contain a field with the explicit size of the variable sized record (as the case is with TPublishedMethod). Instead we must use the size of the fixed fields plus the length of the name field to move the current pointer to the next record in the array. As before, caller is responsible for calling GetNextPublishedField the correct number of times (using GetPublishedFieldCount).

Then we have the searching routines that find a specific published field given different searching criteria, such as field name, field offset or field address. These use the iteration functions above. If successful they return a pointer to the relevant TPublishedField record inside the RTTI structures, otherwise they return nil.

function FindPublishedFieldByName(AClass: TClass; 
const AName: ShortString): PPublishedField;
var
i : integer;
begin
while Assigned(AClass) do
begin
Result := GetFirstPublishedField(AClass);
for i := 0 to GetPublishedFieldCount(AClass)-1 do
begin
// Note: Length(ShortString) expands to efficient inline code
if (Length(Result.Name) = Length(AName)) and
(StrLIComp(@Result.Name[1], @AName[1], Length(AName)) = 0) then
Exit;
Result := GetNextPublishedField(AClass, Result);
end;
AClass := AClass.ClassParent;
end;
Result := nil;
end;

function FindPublishedFieldByOffset(AClass: TClass;
AOffset: Integer): PPublishedField;
var
i : integer;
begin
while Assigned(AClass) do
begin
Result := GetFirstPublishedField(AClass);
for i := 0 to GetPublishedFieldCount(AClass)-1 do
begin
if Result.Offset = AOffset then
Exit;
Result := GetNextPublishedField(AClass, Result);
end;
AClass := AClass.ClassParent;
end;
Result := nil;
end;

function FindPublishedFieldByAddr(Instance: TObject;
AAddr: Pointer): PPublishedField;
begin
Result := FindPublishedFieldByOffset(Instance.ClassType,
PChar(AAddr) - PChar(Instance));
end;

Working directly with the TPublishedField pointers returned by the three functions above can be a little awkward, so I’ve also written a few wrapper routines that return simple values for the offset, address and name of a field in a given class or object reference.

function FindPublishedFieldOffset(AClass: TClass; 
const AName: ShortString): integer;
var
Field: PPublishedField;
begin
Field := FindPublishedFieldByName(AClass, AName);
if Assigned(Field)
then Result := Field.Offset
else Result := -1;
end;

function FindPublishedFieldAddr(Instance: TObject;
const AName: ShortString): PObject;
var
Offset: integer;
begin
Offset := FindPublishedFieldOffset(Instance.ClassType, AName);
if Offset >= 0
then Result := PObject(PChar(Instance) + Offset)
else Result := nil;
end;

function FindPublishedFieldName(AClass: TClass;
AOffset: integer): Shortstring; overload;
var
Field: PPublishedField;
begin
Field := FindPublishedFieldByOffset(AClass, AOffset);
if Assigned(Field)
then Result := Field.Name
else Result := '';
end;

function FindPublishedFieldName(Instance: TObject;
AAddr: Pointer): Shortstring; overload;
var
Field: PPublishedField;
begin
Field := FindPublishedFieldByAddr(Instance, AAddr);
if Assigned(Field)
then Result := Field.Name
else Result := '';
end;

Finally I wrote some routines to return the type, address and value of a published field, once you have a proper TPublishedField pointer in hand. These are useful when you are writing your own functions that iterate the published fields of a class.

function GetPublishedFieldType(AClass: TClass; Field: PPublishedField): TClass;
var
Pft: PPft;
begin
Pft := GetPft(AClass);
if Assigned(Pft) and Assigned(Field) and (Field.TypeIndex < Pft.FieldTypes.TypeCount)
then Result := Pft.FieldTypes.Types[Field.TypeIndex]^
else Result := nil;
end;

function GetPublishedFieldAddr(Instance: TObject; Field: PPublishedField): PObject;
begin
if Assigned(Field)
then Result := PObject(PChar(Instance) + Field.Offset)
else Result := nil;
end;

function GetPublishedFieldValue(Instance: TObject; Field: PPublishedField): TObject;
var
FieldAddr: PObject;
begin
FieldAddr := GetPublishedFieldAddr(Instance, Field);
if Assigned(FieldAddr)
then Result := FieldAddr^
else Result := nil;
end;

Phew! Lots of boring plumbing code there. With that under our wings we can write a reverse engineering function that dumps a reconstructed Pascal class declaration containing all the published fields of a class.

procedure DumpPublishedFields(AClass: TClass); overload;
var
i : integer;
Count: integer;
Field: PPublishedField;
FieldType: TClass;
ParentClass: string;
begin
while Assigned(AClass) do
begin
Count := GetPublishedFieldCount(AClass);
if Count > 0 then
begin
if AClass.ClassParent <> nil
then ParentClass := '('+AClass.ClassParent.ClassName+')'
else ParentClass := '';
writeln('type');
writeln(' ', AClass.ClassName, ' = class', ParentClass);
writeln(' published');
Field := GetFirstPublishedField(AClass);
for i := 0 to Count-1 do
begin
FieldType := GetPublishedFieldType(AClass, Field);
writeln(Format(' %s: %s; // Offs=%d, Index=%d',
[Field.Name, FieldType.ClassName, Field.Offset, Field.TypeIndex]));
Field := GetNextPublishedField(AClass, Field);
end;
writeln(' end;');
writeln;
end;
AClass := AClass.ClassParent;
end;
end;

Just for kicks I wrote a corresponding dumping routine for an object instance that also writes the current value for each field – it is more or less identical to the code above with the addition of a call to GetPublishedFieldValue to get the value of the field in the given instance. Then to test the code, I wrote this:

type
{$M+}
TMyClass = class
published
A: TObject;
LongName: TComponent;
B: TObject;
C: TList;
A2: TObject;
L2ongName: TComponent;
B2: TObject;
C2: TList;
end;

procedure Test;
begin
DumpPublishedFields(TMyClass);
end;

And the output is:

type
TMyClass = class(TObject)
published
A: TObject; // Offs=4, Index=0
LongName: TComponent; // Offs=8, Index=1
B: TObject; // Offs=12, Index=0
C: TList; // Offs=16, Index=2
A2: TObject; // Offs=20, Index=0
L2ongName: TComponent; // Offs=24, Index=1
B2: TObject; // Offs=28, Index=0
C2: TList; // Offs=32, Index=2
end;

Well, that was a lot of fun! :-)

Now we have documented three of the more interesting undocumented VMT fields that point to RTTI information generated by the compiler;

  TVmt = packed record
// ..
FieldTable : PPft;
MethodTable : PPmt;
DynamicTable : PDmt;
// ..
end;

There are still four fields we haven’t looked at yet;

  TVmt = packed record
// ..
IntfTable : Pointer;
AutoTable : Pointer;
InitTable : Pointer;
TypeInfo : Pointer;
// ..
end;

If time and interest permits, we might look at these in upcoming articles.

Acknowledgement. Note that Ray Lischner has documented most of these RTTI structures in his excellent Delphi in a Nutshell book. I'm digging and reverse engineering these structures independently, but it is fun to confirm my findings with what Ray wrote.

4 comments:

Anonymous said...

Where is the definition of the GetVMT function?

Hallvards New Blog said...

> Where is the definition of the GetVMT function?

Click the link for MethodTable and DynamicTable above.

I'm planning to post the full HVVMT unit with test projects in CodeCentral a little later.

I will try to post fully compiling snippets.

Anonymous said...

Quite interesting. But, what do you need a field's type for?

Hallvards New Blog said...

> what do you need a field's type for?

Well, the type array is used to find the run-time TClass reference from the type-name string in the .DFM (see the TReader implementation).

But you are right that the TypeIndex link from the TPublishedField record to the type array does not currently seem to be used. It is useful when we are "de-compiling" like we do here. In theory, VCL could check that the fields declared type is compatible with the component type created from the DFM. Currently it doesn't do that (maybe for performance reasons?) - and the IDE is responsible for keeping the field type and component type in synch.

Note that the IDE cannot use the published field RTTI to check (as the form's code hasn't been compiled yet) - instead it just checks that the field type string from the .pas file matches to component type (I think).



Copyright © 2004-2007 by Hallvard Vassbotn