Hallvard's Blog: Published fields details

In the previous article we looked at how published fields are used by the IDE and VCL to make component references easy to use and to find class references from component type strings. Now we want to dig deeper down into the implementation details of published fields. Starting with analyzing the assembly code in TObject.FieldAddress I was able to reconstruct these approximate Pascal structures:

type
  TPublishedField = packed record
    Offset: Integer;
    Filler: word;  // ??
    Name: {packed} Shortstring; // really string[Length(Name)]
  end;
  PPft = ^TPft;
  TPft = packed record
    Count: Word;
    Filler: LongWord; //??
    Fields: array[0..High(Word)-1] of TPublishedField; // really [0..Count-1]
  end;
  
  PVmt = ^TVmt;
  TVmt = packed record
    // ...
    FieldTable        : PPft;
    // ...
  end;

The FieldTable field in the TVmt structure we’re reverse engineering is defined as a PPft, a pointer to a published field table. The Pft starts with a 2-byte count, and then there is four unknown bytes skipped by TObject.FieldAddress and then an array of variable length TPublisedField records. As in other RTTI structures the shortstring fields are packed so that they only take up the enough space to hold a length byte and the name string. The TPublishedField record contains an Offset into the object instance where the field can be found, 2 unknown bytes and the packed shortstring with the name of the field. We’ll figure out the meaning of these unknown fields shortly.

Luckily, the GetFieldClassTable routine in the implementation section of the Classes unit (which we discussed in the last article), documents clearly that the Filler field of the TPft record points a list of class references. With this information we can update our structures.

type
  PClass  = ^TClass;
  PPublishedFieldTypes = ^TPublishedFieldTypes;
  TPublishedFieldTypes = packed record
    TypeCount: word;
    Types: array[0..High(Word)-1] of PClass; // really [0..TypeCount-1]
  end;
  TPft = packed record
    Count: Word;
    FieldTypes: PPublishedFieldTypes;
    Fields: TPublishedFields; // really [0..Count-1]
  end;

Now we have identified the FieldTypes field that points to a record with a TypeCount and an array of class references. Note that the class references have an extra level of indirection. TClass references are already pointers, but the array actually contains pointers to TClass references. The reason for this is to support RTTI info and TClass VMTs that reside in different modules (packages). We see the same indirection by pointer in the TypInfo unit’s use of PTypeInfo pointers, in the implementation of global variables across units and the InstanceSize and Parent (class) fields of the TVmt. The Delphi package support code generated by the linker automatically fixes up these pointers after all static packages has been loaded by the application.

We still have the unknown filler field in the TPublishedField record. When I first started to write test code and dump this field from the RTIT of selected test classes, it looked like a sequential field index as the values started at 0 and increased steadily; 1, 2, 3. But when I added a second published TObject field, the next index was 0. Hmm. Combined with the seemingly missing link to the FieldTypes array I quickly realized that the unknown TPublishedField was an index into the type reference array.

This also confirmed that the FieldTypes array only contains unique class references. If you have 10 published TLabel fields, there will be only 1 TLabel reference in the FieldTypes array. For large forms with many components of the same type, this saves a little space in the TPublishedField record – each type index is only 2 bytes, while a direct TClass reference would take up 4 bytes. More importantly, the FieldTypes array can now be used to quickly translate from a component name string into a class reference, without wasting time scanning though duplicate class references. As we saw in the last article, this is just what the private TReader.FindComponentClass method does.

After digging through and figuring out the meaning of all the fields, we now have the following type declarations.

type
  PClass  = ^TClass;
  PPublishedField = ^TPublishedField;
  TPublishedField = packed record
    Offset: Integer;
    TypeIndex: word;  // Index into the FieldTypes array below
    Name: {packed} Shortstring; // really string[Length(Name)]
  end;
  PPublishedFieldTypes = ^TPublishedFieldTypes;
  TPublishedFieldTypes = packed record
    TypeCount: word;
    Types: array[0..High(Word)-1] of PClass; // really [0..TypeCount-1]
  end;
  TPublishedFields = packed array[0..High(Word)-1] of TPublishedField;
  PPft = ^TPft;
  TPft = packed record
    Count: Word;
    FieldTypes: PPublishedFieldTypes;
    Fields: TPublishedFields; // really [0..Count-1]
  end;

Apart from the FieldTypes array and the TypeIndex field, this looks strikingly similar to the RTTI structures for published methods. To get a kick-start with writing the utility routines to search and iterate the field table structures I simply used the age-old copy-and-paste and search-and-replace trick.

function GetPft(AClass: TClass): PPft;
var
  Vmt: PVmt;
begin
  Vmt := GetVmt(AClass);
  if Assigned(Vmt)
  then Result := Vmt.FieldTable
  else Result := nil;
end;
 
function GetPublishedFieldCount(AClass: TClass): integer;
var
  Pft: PPft;
begin
  Pft := GetPft(AClass);
  if Assigned(Pft)
  then Result := Pft.Count
  else Result := 0;
end;

The cryptically named GetPft function returns a pointer to the published field table given a class reference. It uses the GetVmt function to get a pointer to the “magic” part of the virtual method table (VMT) and then simply return the value of the FieldTable field. The GetPublishedFieldCount function returns the number of published field in a given class reference (not including the fields of parent classes).

The routines to iterate the published fields of a class using both the index-based lookup and a GetFirst/GetNext based iterators also converted cleanly.

function GetNextPublishedField(AClass: TClass;
  PublishedField: PPublishedField): PPublishedField;
begin
  Result := PublishedField;
  if Assigned(Result) then
    Inc(PChar(Result),   SizeOf(Result.Offset)
                       + SizeOf(Result.TypeIndex)
                       + SizeOf(Result.Name[0])
                       + Length(Result.Name));
end;

function GetPublishedField(AClass: TClass; 
  TypeIndex: integer): PPublishedField;
var
  Pft: PPft;
begin
  Pft := GetPft(AClass);
  if Assigned(Pft) and (TypeIndex < Pft.Count) then
  begin
    Result := @Pft.Fields[0];
    while TypeIndex > 0 do
    begin
      Result := GetNextPublishedField(AClass, Result);
      Dec(TypeIndex);
    end;
  end
  else
    Result := nil;
end;

function GetFirstPublishedField(AClass: TClass): PPublishedField;
begin
  Result := GetPublishedField(AClass, 0);
end;

The only real difference here is that the TPublishedField record does not contain a field with the explicit size of the variable sized record (as the case is with TPublishedMethod). Instead we must use the size of the fixed fields plus the length of the name field to move the current pointer to the next record in the array. As before, caller is responsible for calling GetNextPublishedField the correct number of times (using GetPublishedFieldCount).

Then we have the searching routines that find a specific published field given different searching criteria, such as field name, field offset or field address. These use the iteration functions above. If successful they return a pointer to the relevant TPublishedField record inside the RTTI structures, otherwise they return nil.

function FindPublishedFieldByName(AClass: TClass; 
  const AName: ShortString): PPublishedField;
var
  i : integer;
begin
  while Assigned(AClass) do
  begin
    Result := GetFirstPublishedField(AClass);
    for i := 0 to GetPublishedFieldCount(AClass)-1 do
    begin
      // Note: Length(ShortString) expands to efficient inline code
      if (Length(Result.Name) = Length(AName)) and
         (StrLIComp(@Result.Name[1], @AName[1], Length(AName)) = 0) then
        Exit;
      Result := GetNextPublishedField(AClass, Result);
    end;
    AClass := AClass.ClassParent;
  end;
  Result := nil;
end;

function FindPublishedFieldByOffset(AClass: TClass; 
  AOffset: Integer): PPublishedField;
var
  i : integer;
begin
  while Assigned(AClass) do
  begin
    Result := GetFirstPublishedField(AClass);
    for i := 0 to GetPublishedFieldCount(AClass)-1 do
    begin
      if Result.Offset = AOffset then
        Exit;
      Result := GetNextPublishedField(AClass, Result);
    end;
    AClass := AClass.ClassParent;
  end;
  Result := nil;
end;

function FindPublishedFieldByAddr(Instance: TObject; 
  AAddr: Pointer): PPublishedField;
begin
  Result := FindPublishedFieldByOffset(Instance.ClassType, 
    PChar(AAddr) - PChar(Instance));
end;

Working directly with the TPublishedField pointers returned by the three functions above can be a little awkward, so I’ve also written a few wrapper routines that return simple values for the offset, address and name of a field in a given class or object reference.

function FindPublishedFieldOffset(AClass: TClass; 
  const AName: ShortString): integer;
var
  Field: PPublishedField;
begin
  Field := FindPublishedFieldByName(AClass, AName);
  if Assigned(Field)
  then Result := Field.Offset
  else Result := -1;
end;

function FindPublishedFieldAddr(Instance: TObject; 
  const AName: ShortString): PObject;
var
  Offset: integer;
begin
  Offset := FindPublishedFieldOffset(Instance.ClassType, AName);
  if Offset >= 0
  then Result := PObject(PChar(Instance) + Offset)
  else Result := nil;
end;

function FindPublishedFieldName(AClass: TClass; 
  AOffset: integer): Shortstring; overload;
var
  Field: PPublishedField;
begin
  Field := FindPublishedFieldByOffset(AClass, AOffset);
  if Assigned(Field)
  then Result := Field.Name
  else Result := '';
end;

function FindPublishedFieldName(Instance: TObject; 
  AAddr: Pointer): Shortstring; overload;
var
  Field: PPublishedField;
begin
  Field := FindPublishedFieldByAddr(Instance, AAddr);
  if Assigned(Field)
  then Result := Field.Name
  else Result := '';
end;

Finally I wrote some routines to return the type, address and value of a published field, once you have a proper TPublishedField pointer in hand. These are useful when you are writing your own functions that iterate the published fields of a class.

function GetPublishedFieldType(AClass: TClass; Field: PPublishedField): TClass;
var
  Pft: PPft;
begin
  Pft := GetPft(AClass);
  if Assigned(Pft) and Assigned(Field) and (Field.TypeIndex < Pft.FieldTypes.TypeCount)
  then Result := Pft.FieldTypes.Types[Field.TypeIndex]^
  else Result := nil;
end;

function GetPublishedFieldAddr(Instance: TObject; Field: PPublishedField): PObject;
begin
  if Assigned(Field)
  then Result := PObject(PChar(Instance) + Field.Offset)
  else Result := nil;
end;

function GetPublishedFieldValue(Instance: TObject; Field: PPublishedField): TObject;
var
  FieldAddr: PObject;
begin
  FieldAddr := GetPublishedFieldAddr(Instance, Field);
  if Assigned(FieldAddr)
  then Result := FieldAddr^
  else Result := nil;
end;

Phew! Lots of boring plumbing code there. With that under our wings we can write a reverse engineering function that dumps a reconstructed Pascal class declaration containing all the published fields of a class.

procedure DumpPublishedFields(AClass: TClass); overload;
var
  i : integer;
  Count: integer;
  Field: PPublishedField;
  FieldType: TClass;
  ParentClass: string;
begin
  while Assigned(AClass) do
  begin
    Count := GetPublishedFieldCount(AClass);
    if Count > 0 then
    begin
      if AClass.ClassParent <> nil 
      then ParentClass := '('+AClass.ClassParent.ClassName+')'
      else ParentClass := '';
      writeln('type');
      writeln('  ', AClass.ClassName, ' = class', ParentClass);
      writeln('  published');
      Field := GetFirstPublishedField(AClass);
      for i := 0 to Count-1 do
      begin
        FieldType  := GetPublishedFieldType(AClass, Field);
        writeln(Format('    %s: %s; // Offs=%d, Index=%d',
          [Field.Name, FieldType.ClassName, Field.Offset, Field.TypeIndex]));
        Field := GetNextPublishedField(AClass, Field);
      end;
      writeln('  end;');
      writeln;
    end;
    AClass := AClass.ClassParent;
  end;
end;

Just for kicks I wrote a corresponding dumping routine for an object instance that also writes the current value for each field – it is more or less identical to the code above with the addition of a call to GetPublishedFieldValue to get the value of the field in the given instance. Then to test the code, I wrote this:

type
  {$M+}
  TMyClass = class
  published
    A: TObject;
    LongName: TComponent;
    B: TObject;
    C: TList;
    A2: TObject;
    L2ongName: TComponent;
    B2: TObject;
    C2: TList;
  end;

procedure Test;
begin
  DumpPublishedFields(TMyClass);
end;

And the output is:

type
  TMyClass = class(TObject)
  published
    A: TObject; // Offs=4, Index=0
    LongName: TComponent; // Offs=8, Index=1
    B: TObject; // Offs=12, Index=0
    C: TList; // Offs=16, Index=2
    A2: TObject; // Offs=20, Index=0
    L2ongName: TComponent; // Offs=24, Index=1
    B2: TObject; // Offs=28, Index=0
    C2: TList; // Offs=32, Index=2
  end;

Well, that was a lot of fun! :-)

Now we have documented three of the more interesting undocumented VMT fields that point to RTTI information generated by the compiler;

  TVmt = packed record
    // ..
    FieldTable        : PPft;
    MethodTable       : PPmt;
    DynamicTable      : PDmt;
    // ..
  end;

There are still four fields we haven’t looked at yet;

  TVmt = packed record
    // ..
    IntfTable         : Pointer; 
    AutoTable         : Pointer;
    InitTable         : Pointer;
    TypeInfo          : Pointer;
    // ..
  end;

If time and interest permits, we might look at these in upcoming articles.

Acknowledgement. Note that Ray Lischner has documented most of these RTTI structures in his excellent Delphi in a Nutshell book. I'm digging and reverse engineering these structures independently, but it is fun to confirm my findings with what Ray wrote.

4 comments:

Anonymous said...: Where is the definition of the GetVMT function?; 21 May, 2006 18:35
Hallvards New Blog said...: > Where is the definition of the GetVMT function?

Click the link for MethodTable and DynamicTable above.

I'm planning to post the full HVVMT unit with test projects in CodeCentral a little later.

I will try to post fully compiling snippets.; 21 May, 2006 19:35
Anonymous said...: Quite interesting. But, what do you need a field's type for?; 22 May, 2006 08:57
Hallvards New Blog said...: > what do you need a field's type for?

Well, the type array is used to find the run-time TClass reference from the type-name string in the .DFM (see the TReader implementation).

But you are right that the TypeIndex link from the TPublishedField record to the type array does not currently seem to be used. It is useful when we are "de-compiling" like we do here. In theory, VCL could check that the fields declared type is compatible with the component type created from the DFM. Currently it doesn't do that (maybe for performance reasons?) - and the IDE is responsible for keeping the field type and component type in synch.

Note that the IDE cannot use the published field RTTI to check (as the form's code hasn't been compiled yet) - instead it just checks that the field type string from the .pas file matches to component type (I think).; 24 May, 2006 11:20

Hallvard's Blog

Saturday, May 20, 2006

Published fields details

4 comments:

About Me

My Sites

Labels

Blog Archive

Blogs To Read

Syndication

Page Hits

What do you think of web polls?

DelphiFeeds.com