Hallvard's Blog: More fun with Enumerators

As part of the new language syntax inherited from Delphi.NET, native Delphi now (since Delphi 2005) supports for-in loops (known as foreach in C#). The new syntax is easy to read, and it reduces the clutter of maintaining a loop index variable, checking boundary conditions (typically 0 and Count-1) and indexing into the array or list.

While Delphi has special built-in support for for-in for (got it? ;) arrays, strings, and set types, the RTL and VCL also implements support for for-in by implementing an enumerator pattern. You can do the same with your own collection classes. There are a number of ways to implement such enumerators. We will look at a couple of variants, studying the code that the compiler generates for them. We will mainly focus on getting as efficient code as possible.

The Enumerator Pattern

Primoz Gabrijelcic has already posted several excellent blog posts about how to write enumerators in Delphi - please read them first if you haven't already.

Basically, when writing support for the for-in loop in one of your own classes or records (you can have records with methods these days, you know) you need to provide a single function called GetEnumerator. This function needs to return an instance of a class or a record that needs to have a public MoveNext function and a public Current property.

Here is a complete example that closely mimics the way TList implements its enumerator.

type
  TMyObject = class(TObject)
    procedure Foo;
  end;
  TMyList = class;
  TMyListEnumerator = class
  private
    FIndex: Integer;
    FMyList: TMyList;
  public
    constructor Create(AMyList: TMyList);
    function MoveNext: Boolean;
    function GetCurrent: TMyObject;
    property Current: TMyObject read GetCurrent;
  end;
  TMyList = class(TList) 
  public
    procedure Add(AMyObject: TMyObject);
    function GetEnumerator: TMyListEnumerator;
  end;

implementation

{ TMyObject }

procedure TMyObject.Foo;
begin
end;

{ TMyList }

procedure TMyList.Add(AMyObject: TMyObject);
begin
  inherited Add(AMyObject);
end;

function TMyList.GetEnumerator: TMyListEnumerator;
begin
  Result := TMyListEnumerator.Create(Self);
end;

{ TMyListEnumerator }

constructor TMyListEnumerator.Create(AMyList: TMyList);
begin
  inherited Create;
  FIndex := -1;
  FMyList := AMyList;
end;

function TMyListEnumerator.GetCurrent: TMyObject;
begin
  Result := FMyList.List[FIndex];
end;

function TMyListEnumerator.MoveNext: Boolean;
begin
  Result := FIndex < FMyList.Count - 1;
  if Result then
    Inc(FIndex);
end;

With this code in place we can now create instances of TMyList and run the shiny new for-in loops on them.

procedure Test;  
var
  MyList: TMyList;
  MyObject: TMyObject;
begin
  MyList := TMyList.Create;
  for MyObject in MyList do
    MyObject.Foo;
end;

The Generated Code

This is all pretty straight-forward code, but let's look one level deeper and look at the assembly code that the compiler generates from this code. This is as easy as setting a break-point on the for-in loop, hitting run (F5) and then press Ctrl+Alt+C to open the CPU window. The code we see looks like this (from Delphi 2007).

for MyObject in MyList do
    call TMyList.GetEnumerator
    mov [ebp-$04],eax
    xor eax,eax
    push ebp
    push $0040ff8c
    push dword ptr fs:[eax]
    mov fs:[eax],esp
    jmp $0040ff62
    mov eax,[ebp-$04]
    call TMyListEnumerator.GetCurrent
    mov ebx,eax
  MyObject.Foo;
    mov eax,ebx
    call TMyObject.Foo
  for MyObject in MyList do
    mov eax,[ebp-$04]
    call TMyListEnumerator.MoveNext
    test al,al
    jnz $0040ff51
    xor eax,eax
    pop edx
    pop ecx
    pop ecx
    mov fs:[eax],edx
    push $0040ff93
  MyObject.Foo;
    cmp dword ptr [ebp-$04],$00
    jz $0040ff8b
    mov dl,$01
    mov eax,[ebp-$04]
    call TObject.Destroy
    ret 
    jmp @HandleFinally
    jmp $0040ff7b

Wow! That sure was a lot of machine code from two innocent looking lines of Pascal!

Without digging into the semantics of each assembly instruction, we can quickly see that there are calls to four methods (GetEnumerator, GetCurrent, Foo and MoveNext) and one destructor (TObject.Destroy). There are also some magic gyrations involving FS: [xx] segment overrrides and a jump instruction to HandleFinally - this is the hallmark of a try-finally implementation.

If we try and expand the for-in loop into Pascal code, it would look something like this.

procedure TestImpl;  
var
  MyList: TMyList;
  MyObject: TMyObject;
  MyListEnumerator: TMyListEnumerator;
begin
  MyList := TMyList.Create;
  MyListEnumerator := MyList.GetEnumerator;
  try
    while MyListEnumerator.MoveNext do
    begin
      MyObject := MyListEnumerator.Current;
      MyObject.Foo;
    end;
  finally
    if Assigned(MyListEnumerator) then
      MyListEnumerator.Destroy;
  end;
end;

Is all of this overhead really necessary? Not really. While all of the enumerators in the Delphi RTL and VCL (and even in Primoz' sample code) are implemented as classes (forcing the the compiler to implicitly free the enumerator instance after the loop), there is no rule that says that all enumerators must be implemented by classes.

Enumerator Records

No, light-weight records with methods can implement enumerators, too - and they are allocated directly on the stack and has no need for calling a destructor. So changing the enumerator from a class to a record will make the code smaller and faster. Most enumerators can be favorably be implemented as records. In fact, all the existing enumerators in the Delphi RTL and VCL could be easily re-implemented as records - producing smaller and faster code for all for-in loops that use them. I can see only two reasons to use classes for enumerators; if the class needs to free something in an overridden destructor, or if the class needs to inherit functionality from another class. In all other cases, enumerators should be implemented as records.

Another observation is that the implementations for GetCurrent and MoveNext (and even GetEnumerator) are typically very short and thus perfect candidates for inlining. The two first methods will be called once for every item iterated over in the collection, so it makes sense to inline them at the call site.

So, lets change the sample code above with these two optimizations. To better see the effect on the generated code, we'll do one change at the time. First we replace the class with a record - yielding the following simple changes in the code.

type
  TMyListEnumerator = record
  private
    FIndex: Integer;
    FMyList: TMyList;
  public
    constructor Create(AMyList: TMyList);
    function MoveNext: Boolean;
    function GetCurrent: TMyObject;
    property Current: TMyObject read GetCurrent;
  end;

constructor TMyListEnumerator.Create(AMyList: TMyList);
begin
//  inherited Create;
  FIndex := -1;
  FMyList := AMyList;
end;

The only changes we did was to change "class" to "record" and comment out the call to the inherited constructor (as records do not inherit anything - and cannot have parameterless constructors). Without the need to free the enumerator instance, the code the compiler generates for the for-in loop is now much simpler.

  for MyObject in MyList do
    mov edx,esp
    mov eax,ebx
    call TMyList.GetEnumerator
    jmp $0041010d
    mov eax,esp
    call TMyListEnumerator.GetCurrent
    mov ebx,eax
  MyObject.Foo;
    mov eax,ebx
    call TMyObject.Foo
  for MyObject in MyList do
    mov eax,esp
    call TMyListEnumerator.MoveNext
    test al,al
    jnz $004100fd

We can still see the three method calls to GetEnumerator, GetCurrent, Foo and MoveNext, but the heavy try-finally code and the Destroy call are gone. If we try to write this in Pascal again, it would be something like this.

procedure TestImpl;  
var
  MyList: TMyList;
  MyObject: TMyObject;
  MyListEnumerator: TMyListEnumerator;
begin
  MyList := TMyList.Create;
  MyListEnumerator := MyList.GetEnumerator;
  while MyListEnumerator.MoveNext do
  begin
    MyObject := MyListEnumerator.Current;
    MyObject.Foo;
  end;
end;

That is much better, don't you think!? ;)

Inlining the Loop Calls

While the class-to-record optimization gave some nice code size savings, for long running loops it doesn't really affect the overall running time, because it only affects what happens before and after the loop.

The next step is to turn on inlining the the small and simple enumerator functions. This is as easy as adding the inline directive to the declaration of the methods, like this.

  TMyListEnumerator = record
  private
    FIndex: Integer;
    FMyList: TMyList;
  public
    constructor Create(AMyList: TMyList); 
    function MoveNext: Boolean; inline;
    function GetCurrent: TMyObject; inline;
    property Current: TMyObject read GetCurrent;
  end;

Here we have just added "inline;" to the MoveNext and GetCurrent methods. Recompiling, running and hitting the break-point again, we can inspect the assembly code once again.

  for MyObject in MyList do
    mov edx,esi
    mov eax,ebx
    call TMyList.GetEnumerator
    jmp $00410222
    mov eax,[esi+$04]
    mov eax,[eax+$04]
    mov edx,[esi]
    mov ebx,[eax+edx*4]
  MyObject.Foo;
    mov eax,ebx
    call TMyObject.Foo
  for MyObject in MyList do
    mov eax,esi
    call TMyListEnumerator.MoveNext
    test al,al
    jnz $00410210

Notice that the GetCurrent call is now gone - instead the assembly instructions for its implementation have been merged into our loop. This is good as excessive branching inside a loop brings down performance. But notice that the MoveNext call is still there. For some reason the compiler is not heeding our inline directive for the MoveNext function.

While Expressions Not Inlined

As discussed in my DN4DP piece on Delphi inlining, the inline directive is just a hint that the compiler should try to inline the call if possible - there are a number of documented and undocumented cases where it will not be inlined. It looks like we have stumbled onto one of the undocumented cases here - the MoveNext function of an enumerator will not currently (D2007) be inlined in the code that the compiler generates for a for-in loop.

I've done some more testing and digging, and it seems like no functions are inlined if they appear inside a while loop control expression. Given some mock-up test code:

procedure Foo;
begin
end;

function Inlined(var I: integer): boolean; inline;
begin
  Dec(I);
  Result := I <> 0;
end;

We can now used this inlined function in a while loop.

procedure Test1;
var
  I: integer;
begin
  I := 100;
  while Inlined(I) do
    Foo;
end;

If the inlining worked we should see no call to the Inlined function in the generated assembly code.

  I := 100;
    mov [esp],$00000064
    jmp $0040fdc7
  Foo;
    call Foo
  while Inlined(I) do
    mov eax,esp
    call Inlined
    test al,al
    jnz $0040fdc2

Alas, there is clearly a "call Inlined" in there :-(.

Transforming a While Loop

Let's experiment a little. Testing shows that the Inlined routine is properly inlined for a simple if-test. It is possible to rewrite a while loop with an expression into a while-true loop with an if-statement on the negated expression and a Break. We can rewrite the non-inlining while loop in Test1 with a semantically equivalent loop.

procedure Test2;
var
  I: integer;
begin
  I := 100;
  while True do
  begin
    if not Inlined(I) then
      Break;
    Foo;
  end;
end;

Looking at the generated code again (we're really getting the hang of this, right?;)), we can see that the inlining does work just fine in this loop.

  I := 100;
    mov [esp],$00000064
  if not Inlined(I) then
    dec dword ptr [esp]
    cmp dword ptr [esp],$00
    setnz bl
    test bl,bl
    jz $0040fdf2
    TestInlining.dpr.57: Foo;
    call Foo
  while True do
    jmp $0040fddd

This shows that the compiler is able to inline loop control like this - it just needs a little assistance ;).

Quality Central Reports

CodeGear could fix this in two ways;

Fix the compiler so that while-loop expressions can be inlined (this is the best solution). This should then automatically also inline the MoveNext call that the compiler generates for for-in loops.

If this is hard or impossible for some reason, at least the for-in loop could be changed to generate a while-true loop with if not MoveNext then Break; logic. Thus would ensure that for-in loops can become more efficient than today.

While they are at it, they could also:

Refactor all existing RTL and VCL enumerators from class to record

Inline all GetCurrent and MoveNext functions in all enumerators

Yes, I know should probably log these issues in Quality Central. And I will - eventually ;).

Update: I have now taken the time to report these issues and suggestions in Quality Central. Note: I think the first QC report ended up as a Beta report and may thus not be publically viewable - but it seems CodeGear have noticed anyway ;).

QC#53623 - Function calls inside while expressions are not inlined

QC#53737 - For-in codegen: Enumerator MoveNext calls are not inlined

QC#53738 - Change all RTL and VCL enumerators from class to record

QC#53739 - Inline GetCurrent and MoveNext in all RTL and VCL enumerators

17 comments:

gabr42 said...: Great stuff, as usual!; 17 October, 2007 17:13
Unknown said...: Excellent write-up! And, for the record, the for-in syntax preceded the addition of methods on records. Not that that is any kind of excuse :-)
You should most definitely get the issues you discovered and the suggestion to use records instead of classes into QC.; 17 October, 2007 18:56
Anonymous said...: I think the MoveNext inline problem is document in the D2007 help:

"Procedures and functions used in conditional expressions in while-do and repeat-until statements cannot be expanded inline."

OK, I'll admit that I didn't understand this until I read your blogpost ;-); 17 October, 2007 19:14
Anonymous said...: Can you use a TInferfacedObject for your enumerator?; 17 October, 2007 19:15
Anonymous said...: Really great!

Interesting but fact that an Enumerator could also be an Interface, not only a Class or a Record (this fact is documented). Also an Enumerator could be applied to a Record and to an Interface, not only to a Class (this fact is undocumented, I have several QC reports about it).

I would be great to have deep research of these cases also :p

Really it was surprise for me that record enumerators are faster than class enumerators!; 17 October, 2007 19:18
Anonymous said...: It would be interesting to see a comparison of performance w.r.t old-style for i := 0 to Pred(Count) style loops.

It seems that this post highlights something we learned even from VB for-in.... cute syntax labour savers often come with a performance penalty and ultimately require more thought (and work arounds) to ensure their use does not have unexpected negative consequences.

Of course, fixing the compiler (or rather "changing" it - it aint broke after all) would remove the need for some of the extra thought.

Q: Is there a reverse-for-in? A for-in-step? A for-in-from-X-to-Y (for a subrange of the contents of a list)?; 17 October, 2007 23:12
gabr42 said...: @Jolyon Smith:

You can write for-in-step, for-downto and other custom enumerators yourself. See the third part in my enumerator series - http://17slon.com/blogs/gabr/2007/03/fun-with-enumerators-part-3.html.; 18 October, 2007 09:35
Anonymous said...: See Enumerator and IEnumVariant -Delphi Win32 (French language) :
http://laurent-dardenne.developpez.com/articles/Delphi/2005/langage/win32/iterateurIEnumVariant/; 19 October, 2007 13:33
Anonymous said...: You have a point, but using a record enumerator also makes it impossible to derive from it.

This means that your change will break the following code:

TShapeEnumerator = class(TListEnumerator)
public
function GetCurrent: TShape;
property Current: TShape read GetCurrent;
end;

TShapeList = class(TObjectList)
private
function GetItem(Index: Integer): TShape;
procedure SetItem(Index: Integer; const Value: TShape);
public
function GetEnumerator: TShapeEnumerator;
function Add(AObject: TShape): Integer;
property Items[Index: Integer]: TShape read GetItem write SetItem; default;
end;; 13 November, 2007 21:34
Hallvards New Blog said...: Indeed.

That's why I wrote:
"I can see only two reasons to use classes for enumerators; if the class needs to free something in an overridden destructor, or if the class needs to inherit functionality from another class. In all other cases, enumerators should be implemented as records"

... ;); 14 November, 2007 12:16
Anonymous said...: Can you change your report QC53738 so that it won't break existing code?
It suggests changing TListEnumerator to a record, but that might not be such a good idea for everybody. Well, if we get generics for Win32, that might solve the problem with deriving from rtl record enumerators as well.; 14 November, 2007 12:32
Hallvards New Blog said...: Ah - I get your point now. Changing the RTL/VCL enumerators from class to record *will* of course break existing code that inherits from the enumerators.

One solution would be to rewrite the child class to just use (as a private field) the RTL/VCL enumerator record instead of inheriting from it (aggregation instead of inheritance).

Then you could change your enumerator implementation to a record as well.. :); 14 November, 2007 16:19
elector said...: Isn't this really an implementation of GoF Iterator Pattern? I'm tying to do one in D2007.
Thank you; 26 March, 2011 22:46
elector said...: I saw this question in comments but haven't seen an answer: Can this TMyList = class(TList) be implemented for TInterfaceList ?
It would be very useful.

Thank you; 04 May, 2011 10:51
Hallvards New Blog said...: elector:

Not sure what you mean.

TInterfaceList already implements an enumerator and thus does support for-in loops.

If you're asking if the TInterfaceList enumerator could be optimized to a record instead of a class, yes I think so.

But you may not be gaining too much improvement, as handling interface references (with automatic reference counting etc) will probably still be going on.; 04 May, 2011 12:57
elector said...: @Hallvard
I have tried to use the TINterfaceList but as I have found it is only possible from D2009 and I am using D2007.

http://chrisbensen.blogspot.com/2008/11/using-tinterfacelist-with-for-in-syntax.html

Thank you.; 04 May, 2011 13:44
Hallvards New Blog said...: You should be able to use for-in with a class based TInterfaceList instance in Delphi 2007.

Chris' article documents that you'll need to use Delphi 2009 or later if you want to use for-in with an *interface* based IInterfaceList.

I suggest you show the code you have problems getting to work in D2007.; 04 May, 2011 21:01

Hallvard's Blog

Wednesday, October 17, 2007

More fun with Enumerators

17 comments:

About Me

My Sites

Labels

Blog Archive

Blogs To Read

Syndication

Page Hits

What do you think of web polls?

DelphiFeeds.com