Wednesday, October 17, 2007

More fun with Enumerators

As part of the new language syntax inherited from Delphi.NET, native Delphi now (since Delphi 2005) supports for-in loops (known as foreach in C#). The new syntax is easy to read, and it reduces the clutter of maintaining a loop index variable, checking boundary conditions (typically 0 and Count-1) and indexing into the array or list.  

While Delphi has special built-in support for for-in for (got it? ;) arrays, strings, and set types, the RTL and VCL also implements support for for-in by implementing an enumerator pattern. You can do the same with your own collection classes. There are a number of ways to implement such enumerators. We will look at a couple of variants, studying the code that the compiler generates for them. We will mainly focus on getting as efficient code as possible.

The Enumerator Pattern

Primoz Gabrijelcic has already posted several excellent blog posts about how to write enumerators in Delphi - please read them first if you haven't already.

Basically, when writing support for the for-in loop in one of your own classes or records (you can have records with methods these days, you know) you need to provide a single function called GetEnumerator. This function needs to return an instance of a class or a record that needs to have a public MoveNext function and a public Current property.

Here is a complete example that closely mimics the way TList implements its enumerator.

type
TMyObject = class(TObject)
procedure Foo;
end;
TMyList = class;
TMyListEnumerator = class
private
FIndex: Integer;
FMyList: TMyList;
public
constructor Create(AMyList: TMyList);
function MoveNext: Boolean;
function GetCurrent: TMyObject;
property Current: TMyObject read GetCurrent;
end;
TMyList = class(TList)
public
procedure Add(AMyObject: TMyObject);
function GetEnumerator: TMyListEnumerator;
end;

implementation

{ TMyObject }

procedure TMyObject.Foo;
begin
end;

{ TMyList }

procedure TMyList.Add(AMyObject: TMyObject);
begin
inherited Add(AMyObject);
end;

function TMyList.GetEnumerator: TMyListEnumerator;
begin
Result := TMyListEnumerator.Create(Self);
end;

{ TMyListEnumerator }

constructor TMyListEnumerator.Create(AMyList: TMyList);
begin
inherited Create;
FIndex := -1;
FMyList := AMyList;
end;

function TMyListEnumerator.GetCurrent: TMyObject;
begin
Result := FMyList.List[FIndex];
end;

function TMyListEnumerator.MoveNext: Boolean;
begin
Result := FIndex < FMyList.Count - 1;
if Result then
Inc(FIndex);
end;

With this code in place we can now create instances of TMyList and run the shiny new for-in loops on them.

procedure Test;  
var
MyList: TMyList;
MyObject: TMyObject;
begin
MyList := TMyList.Create;
for MyObject in MyList do
MyObject.Foo;
end;

The Generated Code


This is all pretty straight-forward  code, but let's look one level deeper and look at the assembly code that the compiler generates from this code. This is as easy as setting a break-point on the for-in loop, hitting run (F5) and then press Ctrl+Alt+C to open the CPU window. The code we see looks like this (from Delphi 2007).

for MyObject in MyList do
call TMyList.GetEnumerator
mov [ebp-$04],eax
xor eax,eax
push ebp
push $0040ff8c
push dword ptr fs:[eax]
mov fs:[eax],esp
jmp $0040ff62
mov eax,[ebp-$04]
call TMyListEnumerator.GetCurrent
mov ebx,eax
MyObject.Foo;
mov eax,ebx
call TMyObject.Foo
for MyObject in MyList do
mov eax,[ebp-$04]
call TMyListEnumerator.MoveNext
test al,al
jnz $0040ff51
xor eax,eax
pop edx
pop ecx
pop ecx
mov fs:[eax],edx
push $0040ff93
MyObject.Foo;
cmp dword ptr [ebp-$04],$00
jz $0040ff8b
mov dl,$01
mov eax,[ebp-$04]
call TObject.Destroy
ret
jmp @HandleFinally
jmp $0040ff7b

Wow! That sure was a lot of machine code from two innocent looking lines of Pascal!


Without digging into the semantics of each assembly instruction, we can quickly see that there are calls to four methods (GetEnumerator, GetCurrent, Foo and MoveNext) and one destructor (TObject.Destroy). There are also some magic gyrations involving FS: [xx] segment overrrides and a jump instruction to HandleFinally - this is the hallmark of a try-finally implementation.


If we try and expand the for-in loop into Pascal code, it would look something like this.

procedure TestImpl;  
var
MyList: TMyList;
MyObject: TMyObject;
MyListEnumerator: TMyListEnumerator;
begin
MyList := TMyList.Create;
MyListEnumerator := MyList.GetEnumerator;
try
while MyListEnumerator.MoveNext do
begin
MyObject := MyListEnumerator.Current;
MyObject.Foo;
end;
finally
if Assigned(MyListEnumerator) then
MyListEnumerator.Destroy;
end;
end;

Is all of this overhead really necessary? Not really. While all of the enumerators in the Delphi RTL and VCL (and even in Primoz' sample code) are implemented as classes (forcing the the compiler to implicitly free the enumerator instance after the loop), there is no rule that says that all enumerators must be implemented by classes.


Enumerator Records


No, light-weight records with methods can implement enumerators, too - and they are allocated directly on the stack and has no need for calling a destructor. So changing the enumerator from a class to a record will make the code smaller and faster. Most enumerators can be favorably be implemented as records. In fact, all the existing enumerators in the Delphi RTL and VCL could be easily re-implemented as records - producing smaller and faster code for all for-in loops that use them. I can see only two reasons to use classes for enumerators; if the class needs to free something in an overridden destructor, or if the class needs to inherit functionality from another class. In all other cases, enumerators should be implemented as records.


Another observation is that the implementations for GetCurrent and MoveNext (and even GetEnumerator) are typically very short and thus perfect candidates for inlining. The two first methods will be called once for every item iterated over in the collection, so it makes sense to inline them at the call site.


So, lets change the sample code above with these two optimizations. To better see the effect on the generated code, we'll do one change at the time. First we replace the class with a record - yielding the following simple changes in the code. 

type
TMyListEnumerator = record
private
FIndex: Integer;
FMyList: TMyList;
public
constructor Create(AMyList: TMyList);
function MoveNext: Boolean;
function GetCurrent: TMyObject;
property Current: TMyObject read GetCurrent;
end;

constructor TMyListEnumerator.Create(AMyList: TMyList);
begin
// inherited Create;
FIndex := -1;
FMyList := AMyList;
end;

The only changes we did was to change "class" to "record" and comment out the call to the inherited constructor (as records do not inherit anything - and cannot have parameterless constructors). Without the need to free the enumerator instance, the code the compiler generates for the for-in loop is now much simpler.

  for MyObject in MyList do
mov edx,esp
mov eax,ebx
call TMyList.GetEnumerator
jmp $0041010d
mov eax,esp
call TMyListEnumerator.GetCurrent
mov ebx,eax
MyObject.Foo;
mov eax,ebx
call TMyObject.Foo
for MyObject in MyList do
mov eax,esp
call TMyListEnumerator.MoveNext
test al,al
jnz $004100fd

We can still see the three method calls to GetEnumerator, GetCurrent, Foo and MoveNext, but the heavy try-finally code and the Destroy call are gone. If we try to write this in Pascal again, it would be something like this.

procedure TestImpl;  
var
MyList: TMyList;
MyObject: TMyObject;
MyListEnumerator: TMyListEnumerator;
begin
MyList := TMyList.Create;
MyListEnumerator := MyList.GetEnumerator;
while MyListEnumerator.MoveNext do
begin
MyObject := MyListEnumerator.Current;
MyObject.Foo;
end;
end;

That is much better, don't you think!? ;)


Inlining the Loop Calls


While the class-to-record optimization gave some nice code size savings, for long running loops it doesn't really affect the overall running time, because it only affects what happens before and after the loop.


The next step is to turn on inlining the the small and simple enumerator functions. This is as easy as adding the inline directive to the declaration of the methods, like this.

  TMyListEnumerator = record
private
FIndex: Integer;
FMyList: TMyList;
public
constructor Create(AMyList: TMyList);
function MoveNext: Boolean; inline;
function GetCurrent: TMyObject; inline;
property Current: TMyObject read GetCurrent;
end;

Here we have just added "inline;" to the MoveNext and GetCurrent methods. Recompiling, running and hitting the break-point again, we can inspect the assembly code once again.

  for MyObject in MyList do
mov edx,esi
mov eax,ebx
call TMyList.GetEnumerator
jmp $00410222
mov eax,[esi+$04]
mov eax,[eax+$04]
mov edx,[esi]
mov ebx,[eax+edx*4]
MyObject.Foo;
mov eax,ebx
call TMyObject.Foo
for MyObject in MyList do
mov eax,esi
call TMyListEnumerator.MoveNext
test al,al
jnz $00410210

Notice that the GetCurrent call is now gone - instead the assembly instructions for its implementation have been merged into our loop. This is good as excessive branching inside a loop brings down performance. But notice that the MoveNext call is still there. For some reason the compiler is not heeding our inline directive for the MoveNext function.


While Expressions Not Inlined


As discussed in my DN4DP piece on Delphi inlining, the inline directive is just a hint that the compiler should try to inline the call if possible - there are a number of documented and undocumented cases where it will not be inlined. It looks like we have stumbled onto one of the undocumented cases here - the MoveNext function of an enumerator will not currently (D2007) be inlined in the code that the compiler generates for a for-in loop.


I've done some more testing and digging, and it seems like no functions are inlined if they appear inside a while loop control expression. Given some mock-up test code:

procedure Foo;
begin
end;

function Inlined(var I: integer): boolean; inline;
begin
Dec(I);
Result := I <> 0;
end;

We can now used this inlined function in a while loop.

procedure Test1;
var
I: integer;
begin
I := 100;
while Inlined(I) do
Foo;
end;

If the inlining worked we should see no call to the Inlined function in the generated assembly code.

  I := 100;
mov [esp],$00000064
jmp $0040fdc7
Foo;
call Foo
while Inlined(I) do
mov eax,esp
call Inlined
test al,al
jnz $0040fdc2

Alas, there is clearly a "call Inlined" in there :-(. 


Transforming a While Loop


Let's experiment a little. Testing shows that the Inlined routine is properly inlined for a simple if-test. It is possible to rewrite a while loop with an expression into a while-true loop with an if-statement on the negated expression and a Break. We can rewrite the non-inlining while loop in Test1 with a semantically equivalent loop.

procedure Test2;
var
I: integer;
begin
I := 100;
while True do
begin
if not Inlined(I) then
Break;
Foo;
end;
end;

Looking at the generated code again (we're really getting the hang of this, right?;)), we can see that the inlining does work just fine in this loop.

  I := 100;
mov [esp],$00000064
if not Inlined(I) then
dec dword ptr [esp]
cmp dword ptr [esp],$00
setnz bl
test bl,bl
jz $0040fdf2
TestInlining.dpr.57: Foo;
call Foo
while True do
jmp $0040fddd

This shows that the compiler is able to inline loop control like this - it just needs a little assistance ;).


Quality Central Reports


CodeGear could fix this in two ways;



  • Fix the compiler so that while-loop expressions can be inlined (this is the best solution). This should then automatically also inline the MoveNext call that the compiler generates for for-in loops.

  • If this is hard or impossible for some reason, at least the for-in loop could be changed to generate a while-true loop with if not MoveNext then Break; logic. Thus would ensure that for-in loops can become more efficient than today.

While they are at it, they could also:



  • Refactor all existing RTL and VCL enumerators from class to record

  • Inline all GetCurrent and MoveNext functions in all enumerators

Yes, I know should probably log these issues in Quality Central. And I will - eventually ;).


Update: I have now taken the time to report these issues and suggestions in Quality Central. Note: I think the first QC report ended up as a Beta report and may thus not be publically viewable - but it seems CodeGear have noticed anyway ;).


QC#53623 - Function calls inside while expressions are not inlined


QC#53737 - For-in codegen: Enumerator MoveNext calls are not inlined


QC#53738 - Change all RTL and VCL enumerators from class to record


QC#53739 - Inline GetCurrent and MoveNext in all RTL and VCL enumerators

17 comments:

gabr42 said...

Great stuff, as usual!

Unknown said...

Excellent write-up! And, for the record, the for-in syntax preceded the addition of methods on records. Not that that is any kind of excuse :-)
You should most definitely get the issues you discovered and the suggestion to use records instead of classes into QC.

Anonymous said...

I think the MoveNext inline problem is document in the D2007 help:

"Procedures and functions used in conditional expressions in while-do and repeat-until statements cannot be expanded inline."

OK, I'll admit that I didn't understand this until I read your blogpost ;-)

Anonymous said...

Can you use a TInferfacedObject for your enumerator?

Anonymous said...

Really great!

Interesting but fact that an Enumerator could also be an Interface, not only a Class or a Record (this fact is documented). Also an Enumerator could be applied to a Record and to an Interface, not only to a Class (this fact is undocumented, I have several QC reports about it).

I would be great to have deep research of these cases also :p

Really it was surprise for me that record enumerators are faster than class enumerators!

Anonymous said...

It would be interesting to see a comparison of performance w.r.t old-style for i := 0 to Pred(Count) style loops.

It seems that this post highlights something we learned even from VB for-in.... cute syntax labour savers often come with a performance penalty and ultimately require more thought (and work arounds) to ensure their use does not have unexpected negative consequences.

Of course, fixing the compiler (or rather "changing" it - it aint broke after all) would remove the need for some of the extra thought.

Q: Is there a reverse-for-in? A for-in-step? A for-in-from-X-to-Y (for a subrange of the contents of a list)?

gabr42 said...

@Jolyon Smith:

You can write for-in-step, for-downto and other custom enumerators yourself. See the third part in my enumerator series - http://17slon.com/blogs/gabr/2007/03/fun-with-enumerators-part-3.html.

Anonymous said...

See Enumerator and IEnumVariant -Delphi Win32 (French language) :
http://laurent-dardenne.developpez.com/articles/Delphi/2005/langage/win32/iterateurIEnumVariant/

Anonymous said...

You have a point, but using a record enumerator also makes it impossible to derive from it.

This means that your change will break the following code:

TShapeEnumerator = class(TListEnumerator)
public
function GetCurrent: TShape;
property Current: TShape read GetCurrent;
end;

TShapeList = class(TObjectList)
private
function GetItem(Index: Integer): TShape;
procedure SetItem(Index: Integer; const Value: TShape);
public
function GetEnumerator: TShapeEnumerator;
function Add(AObject: TShape): Integer;
property Items[Index: Integer]: TShape read GetItem write SetItem; default;
end;

Hallvards New Blog said...

Indeed.

That's why I wrote:
"I can see only two reasons to use classes for enumerators; if the class needs to free something in an overridden destructor, or if the class needs to inherit functionality from another class. In all other cases, enumerators should be implemented as records"

... ;)

Anonymous said...

Can you change your report QC53738 so that it won't break existing code?
It suggests changing TListEnumerator to a record, but that might not be such a good idea for everybody. Well, if we get generics for Win32, that might solve the problem with deriving from rtl record enumerators as well.

Hallvards New Blog said...

Ah - I get your point now. Changing the RTL/VCL enumerators from class to record *will* of course break existing code that inherits from the enumerators.

One solution would be to rewrite the child class to just use (as a private field) the RTL/VCL enumerator record instead of inheriting from it (aggregation instead of inheritance).

Then you could change your enumerator implementation to a record as well.. :)

elector said...

Isn't this really an implementation of GoF Iterator Pattern? I'm tying to do one in D2007.
Thank you

elector said...

I saw this question in comments but haven't seen an answer: Can this TMyList = class(TList) be implemented for TInterfaceList ?
It would be very useful.

Thank you

Hallvards New Blog said...

elector:

Not sure what you mean.

TInterfaceList already implements an enumerator and thus does support for-in loops.

If you're asking if the TInterfaceList enumerator could be optimized to a record instead of a class, yes I think so.

But you may not be gaining too much improvement, as handling interface references (with automatic reference counting etc) will probably still be going on.

elector said...

@Hallvard
I have tried to use the TINterfaceList but as I have found it is only possible from D2009 and I am using D2007.

http://chrisbensen.blogspot.com/2008/11/using-tinterfacelist-with-for-in-syntax.html

Thank you.

Hallvards New Blog said...

You should be able to use for-in with a class based TInterfaceList instance in Delphi 2007.

Chris' article documents that you'll need to use Delphi 2009 or later if you want to use for-in with an *interface* based IInterfaceList.

I suggest you show the code you have problems getting to work in D2007.



Copyright © 2004-2007 by Hallvard Vassbotn