blog
recent
archive
twitter

projects
Mac OS X
CSC Menu
Valgrind
Fringe Player
pssh
Peal
Frankenmouse

   

Hamster Emporium

by Greg Parker, runtime wrangler and general specialist

(link) [objc explain]: So you crashed in objc_msgSend(): iPhone Edition   (2009-06-08 11:40 PM)
 

So you crashed in objc_msgSend() has been updated with register usage for iPhone's ARM processor. The table now looks like this:

objc_msgSend
objc_msgSend_fpret
objc_msgSend_stret
 receiverSELreceiverSEL
i386eax*ecx eax*ecx
x86_64rdirsi rsirdx
ppcr3r4 r4r5
ppc64r3r4 r4r5
armr0r1 r1r2

(link) Valgrind for Mac OS X goes mainline   (2009-06-03 7:20 PM)
 

Thanks to the heroic work of Nicholas Nethercote and Julian Seward, the Mac OS X port of Valgrind is now available on Valgrind's trunk. This is a big step forward towards an official Valgrind release with Mac OS X support.

For those of you with Snow Leopard seeds, Valgrind won't work. Valgrind operates at the low-level unsupported guts of the kernel/Libc interface. When the kernel and Libc change, Valgrind needs to adapt or die. Valgrind support for Snow Leopard will not be available until the open-source release of Snow Leopard's kernel and Libc at the earliest, which in turn is not before Snow Leopard itself ships.

(link) [objc explain]: Monomorphic dispatch   (2009-05-14 8:04 PM)
 

Polymorphic dispatch means a single call site could branch to one of several different implementations. C function calls are not polymorphic; Objective-C methods and C++ virtual methods are polymorphic.

The monomorphic dispatch optimization is used when a call site could call different implementations in principle, but can only ever call one particular implementation in reality. Then the optimizer can eliminate the polymorphic dispatcher's overhead and jump directly to the right place or even inline the callee locally. This is a classic optimization for dynamic-compiled runtimes from Smalltalk to Java and JavaScript.

There are some complications. First is dynamically-loaded code like shared libraries or eval() operations. If your new code provides a second implementation of a call that was previously monomorphic, you need to be able to undo the previous optimization on the fly and recompile it or fall back to an interpreter. Any dynamic compiler worth the name can do this nowadays.

Second, the area to search for additional implementations depends on the type-strictness of your language. If the compile-time type of the receiver rigidly defines the allowed runtime types, then the implementation need only be unique within that part of the hierarchy for the optimization to work. Window.title and Employee.title wouldn't interfere with each other at a strictly-typed call site whose receiver is of type Employee (or a subclass thereof).

How does Objective-C fit in here? In general, it doesn't. The monomorphic optimization is hard to apply to Objective-C. The two problems above loom large because of the language's definition, even if it suddenly acquired a runtime recompiler tomorrow.

Objective-C's call sites are not type-strict. The code may say the receiver is of some type, but at runtime it could actually be a Distributed Objects proxy or a unit test mock or a scripting bridge shim. You'd have to look at all classes to decide if a selector has multiple implementations, instead of searching only a subtree of the class hierarchy.

Even worse, there are zero selectors that have only a single implementation. They all have at least two: the one that exists in some class, and the one from every other class that calls -forwardInvocation:. You can never jump directly to any implementation, because if your receiver object is of the wrong type then you need to call the forwarding machinery instead. And checking the receiver's type quickly eats any optimization profit; you can only make a handful of checks before your cost is the same as objc_msgSend().

There are some important cases where monomorphic dispatch would still work in Objective-C. The container classes in particular have only one or two real implementations, so a receiver type check could be fast enough. And in other places you can make a single relatively expensive type check but then re-use the result many times, such as a series of [self ...] calls. The tricky part is identifying which selectors and call sites would optimize well, without taking too much time or memory to do so.

The monomorphic dispatch optimization will be present in some future dynamic-recompiling Objective-C runtime, but it won't work as well as it does in other less-dynamic languages.

(link) [objc explain]: Classes and metaclasses   (2009-04-14 08:35 PM)
 

Objective-C is a class-based object system. Each object is an instance of some class; the object's isa pointer points to its class. That class describes the object's data: allocation size and ivar types and layout. The class also describes the object's behavior: the selectors it responds to and instance methods it implements.

The class's method list is the set of instance methods, the selectors that the object responds to. When you send a message to an instance, objc_msgSend() looks through the method list of that object's class (and superclasses, if any) to decide what method to call.

Each Objective-C class is also an object. It has an isa pointer and other data, and can respond to selectors. When you call a "class method" like [NSObject alloc], you are actually sending a message to that class object.

Since a class is an object, it must be an instance of some other class: a metaclass. The metaclass is the description of the class object, just like the class is the description of ordinary instances. In particular, the metaclass's method list is the class methods: the selectors that the class object responds to. When you send a message to a class - an instance of a metaclass - objc_msgSend() looks through the method list of the metaclass (and its superclasses, if any) to decide what method to call. Class methods are described by the metaclass on behalf of the class object, just like instance methods are described by the class on behalf of the instance objects.

What about the metaclass? Is it metaclasses all the way down? No. A metaclass is an instance of the root class's metaclass; the root metaclass is itself an instance of the root metaclass. The isa chain ends in a cycle here: instance to class to metaclass to root metaclass to itself. The behavior of metaclass isa pointers rarely matters, since in the real world nobody sends messages to metaclass objects.

More important is the superclass of a metaclass. The metaclass's superclass chain parallels the class's superclass chain, so class methods are inherited in parallel with instance methods. And the root metaclass's superclass is the root class, so each class object responds to the root class's instance methods. In the end, a class object is an instance of (a subclass of) the root class, just like any other object.

Confused? The diagram may help. Remember, when a message is sent to any object, the method lookup starts with that object's isa pointer, then continues up the superclass chain. "Instance methods" are defined by the class, and "class methods" are defined by the metaclass plus the root (non-meta) class.

In proper computer science language theory, a class and metaclass hierarchy can be more free-form, with deeper metaclass chains and multiple classes instantiated from any single metaclass. Objective-C uses metaclasses for practical goals like class methods, but otherwise tends to hide metaclasses. For example, [NSObject class] is identical to [NSObject self], even though in formal terms it ought to return the metaclass that NSObject->isa points to. The Objective-C language is a set of practical compromises; here it limits the class schema before it gets too, well, meta.

(link) [objc explain]: Non-fragile ivars   (2009-01-27 09:30 PM)
 

Non-fragile instance variables are a headline feature of the modern Objective-C runtime available on iPhone and 64-bit Mac. They provide framework developers more flexibility without losing binary compatibility, and pave the way for automatically-synthesized property ivars and ivars declared outside a class's interface.

The fragile base class problem

Fragile ivars are a subset of the classic fragile base class problem. In some languages, a superclass cannot be changed without also recompiling all subclasses of that class. For example, adding data members or virtual member functions to a C++ superclass will break binary compatibility with any subclass of that class, even if the added members are private and invisible to the subclass.

In classic Objective-C, methods are mostly non-fragile, thanks to dynamic message dispatch. You can freely add methods to a superclass, as long as you don't have name conflicts. But Objective-C ivars are fragile on 32-bit Mac.

32-bit Mac: fragile Objective-C ivars

Say you're writing the world's next great pet shop application for Mac OS X Leopard. You might have this PetShopView subclass of NSView, with arrays for the puppies and kittens in the pet shop.

NSView (Leopard)
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
PetShopView
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
28 NSArray *kittens
32 NSArray *puppies

Then Mac OS X Def Leopard comes out, with its new multi-paw interface technology. The AppKit developers add some paw-tracking code to NSView.

NSView (Def Leopard)
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
28 NSSet *touchedPaws
PetShopView
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
28 NSArray *kittens
32 NSArray *puppies

Unfortunately, your kittens are doomed by fragile ivars. Alternatively, the AppKit developers are trapped with whatever ivars they chose in Mac OS X 10.0.

iPhone and 64-bit Mac: non-fragile Objective-C ivars

What you and the AppKit developers really want is something like this.

NSView (Def Leopard)
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
28 NSSet *touchedPaws
PetShopView
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
28 NSSet *touchedPaws
32 NSArray *kittens
36 NSArray *puppies

Here, the runtime has recognized that NSView is now larger than it was when PetShopView was compiled. The subclass ivars slide in response, without recompiling any code, and the kittens are saved by a dynamic runtime.

How it works

The generated code for classic Objective-C ivar access works like a C struct field. The offset to the ivar is a constant determined at compile time. The new ivar code instead creates a variable for each ivar which contains the offset to that ivar, and all code that accesses the ivar uses that variable. At launch time, the runtime can change any ivar offset variable if it detects an oversize superclass.

In the pet shop example, _OBJC_IVAR_PetShopView_kittens is 28 at compile time, but the runtime changes it to 32 when it sees the Def Leopard version of NSView. No code needs to be recompiled, and the performance overhead of the extra ivar offset variable is small. AppKit is happy, you're happy, and the kittens are happy.

(link) Infinity isn't as long as you think   (2008-11-30 3:29 AM)
 

My intuition about infinite time used to go something like this: every possible event will happen, if you wait long enough.

Consider a random walk on a 2-D grid. Will the drunkard return to the starting point? Yes, if you wait long enough: you can prove that the random walk will, eventually, return. (Formally, it returns "almost surely" or "with probability 1". For example, the random walk could proceed due east forever and never return; it's not impossible, just infinitely improbable.)

In fact, you can prove the same thing about every other point on the grid, not just the start. Each point will ("almost surely") be visited during the random walk. If you wait long enough, the drunkard will wander everywhere. Score a point for the "everything happens eventually" idea.

But that intuition is wrong. Forever is not always enough.

Move the drunkard to a 3-D grid. Now the probability of returning to the start is not "almost sure". Instead, it's a little over one-third. If you start three drunkards on a 3-D random walk, you would expect only one of them to ever come back, even if you wait forever. And the odds are even worse with more dimensions.

Some events still don't happen after infinite time. Infinity isn't as long as you think.

http://en.wikipedia.org/wiki/Random_walk
http://mathworld.wolfram.com/PolyasRandomWalkConstants.html

(link) [objc explain]: objc_msgSend_fpret   (2008-11-16 7:00 PM)
 

objc_msgSend is the Objective-C message dispatcher. objc_msgSend_stret is exactly the same, but for methods that return values of struct types. And objc_msgSend_fpret is for methods that return floating-point values on some architectures.

 objc_msgSend_fpret return types
i386 float, double, long double
x86_64 long double
ppc none (identical to objc_msgSend)
ppc64
arm

objc_msgSend_stret exists because the parameters are passed in different places for struct-returning functions. That's not the problem that objc_msgSend_fpret solves. Instead, objc_msgSend_fpret exists to handle the return value itself correctly. Specifically, the return value of a message sent to nil on i386 and x86_64.

Messages to nil return zero if possible. On ppc, objc_msgSend with a nil receiver clears registers r3, r4, f1, and f2 before returning. This means any pointer or integer or floating-point return value will be zero, and structs are undefined. On ppc, objc_msgSend_fpret is unnecessary, because clearing f1 and f2 is harmless if the caller is actually expecting a value in r3 or r4.

i386 is different. The floating-point registers there are historically derived from the 8087 FPU for the original 8086 CPU. The x87 unit is an odd beast: it has eight floating-point registers, but the registers themselves are used as if they were a stack. Values are pushed and popped from this stack even though the stack is stored in registers and has a maximum of eight entries.

For return values, the callee pushes the value on the x87 stack and the caller pops it. This is fine for C functions, but not so good when objc_msgSend returns zero. objc_msgSend does not know what return type the caller really wants, and it must not push a zero on the x87 stack unless the caller expects to pop a floating-point value.

The solution is objc_msgSend_fpret. During a message to nil on i386, objc_msgSend_fpret pushes a zero on the x87 stack which the caller will pop, and objc_msgSend does not. The caller knows which return type the caller expects, and uses the matching dispatcher. On ppc, ppc64, and arm, objc_msgSend_fpret is identical to objc_msgSend and is usually unused.

What about x86_64? This architecture is one step forward, two steps back. The good news is that return values of types float and double are returned in the XMM registers. objc_msgSend can handle that itself just like ppc. The bad news is that long double still uses the x87 stack. So objc_msgSend_fpret still exists, but is only used for long double on x86_64. The worse news is that C99's complex long double returns two values on the x87 stack. So now there's objc_msgSend_fp2ret for that case only. Currently no compiler actually uses objc_msgSend_fp2ret, so hopefully nobody is writing code that sends a message to nil on x86_64 and expects a zero return value of type complex long double.

One last thing: Mac OS X 10.4 and earlier did not return zero for floating-point messages to nil on ppc. Be careful if you're writing code for those older systems. All other architectures have always returned floating-point zero, including i386 on 10.4.

(link) [objc explain]: objc_msgSend_stret   (2008-10-30 10:28 PM)
 

objc_msgSend is the Objective-C message dispatcher. It's the function-calling function, using selector and the receiver object's class to decide where to jump to. objc_msgSend_stret is exactly the same, but for methods that return values of struct types. Why does objc_msgSend_stret exist? Because of the machine-level guts of the C language require it, and Objective-C methods are C functions if you tilt your head and squint a bit.

On most processors, the first few parameters to a function are passed in CPU registers, and return values are handed back in CPU registers. Objective-C methods do the same, but with id self and SEL _cmd as the first two parameters. Here's a PowerPC example:

    -(int) method:(id)arg;
        r3 = self
        r4 = _cmd, @selector(method:)
        r5 = arg
	(on exit) r3 = returned int

CPU registers work fine for small return values like ints and pointers, but structure values can be too big to fit. For structs, the caller allocates stack space for the returned struct, passes the address of that storage to the function, and the function writes its return value into that space. The address of the struct is an implicit first parameter just like self and _cmd:

    -(struct st) method:(id)arg;
        r3 = &struct_var (in caller's stack frame)
        r4 = self
        r5 = _cmd, @selector(method:)
        r6 = arg
        (on exit) return value written into struct_var

Now consider objc_msgSend's task. It uses _cmd and self->isa to choose the destination. But self and _cmd are in different registers if the method will return a struct, and objc_msgSend can't tell that in advance. Thus objc_msgSend_stret: just like objc_msgSend, but reading its values from different registers.

But there's a catch.

On most architectures, some small C structs are returned in registers after all, instead of using the struct-address first parameter that objc_msgSend_stret expects. If the struct type falls into this category, then objc_msgSend is used instead. So the "struct return" part of objc_msgSend_stret refers to the architecture's definition of a stack-returned struct, which may not match C struct.

The rules for which struct types return in registers are always arcane, sometimes insane. ppc32 is trivial: structs never return in registers. i386 is straightforward: structs with sizeof exactly equal to 1, 2, 4, or 8 return in registers. x86_64 is more complicated, including rules for returning floating-point struct fields in FPU registers, and ppc64's rules and exceptions will make your head spin. The gory details are documented in the Mac OS X ABI Guide, though as usual if the documentation and the compiler disagree then the documentation is wrong.

If you're calling objc_msgSend directly and need to know whether to use objc_msgSend_stret for a particular struct type, I recommend the empirical approach: write a line of code that calls your method, compile it on each architecture you care about, and look at the assembly code to see which dispatch function the compiler uses.

(link) updated: Valgrind for Mac OS X   (2008-10-27 5:47 PM)
 

Update for Valgrind for Mac OS X: fixes a hang at launch when reading debug info.

The entire difference between valgrind-opensource-3 and valgrind-opensource-4 is:

    -#if !defined(VGO_DARWIN)
    +#if !defined(VGO_darwin)
D'oh!

(link) Space is time: how your CS theory class lied to you   (2008-10-14 1:04 AM)
 

In your computer science algorithms course, you learned about space-time tradeoffs. An algorithm that requires lots of time can often be changed to take less time but more space. A wide range of performance optimizations work this way, from caching to memoization to loop unrolling.

But the "tradeoff" is a lie. Space is time.

Every use of space incurs a time cost. In your theory class, the time cost of space is swept under the big-O rug. On your big-iron machine running a single computational workload or CPU benchmark, the time cost of space is small compared to the other time costs involved. But in the real world of consumer-grade devices, with limited memory and power, the time cost of space is tremendous. A performance optimization that tries to trade less time for more space often ends up requiring more time and more space.

The gcc compiler uses a garbage collector to manage its memory. To save time, gcc does not even start to collect any garbage until its memory size is quite large. "That's fine", you might say, "my new machine has gigabytes of memory". But the kernel needs a big chunk of memory just to keep track of the rest of the memory. And you're running a web browser, and an email client, and an IDE, and music and chat and clock and search and sync and backup and everything else you didn't have a decade or two ago. And your build system runs multiple gcc commands in parallel because your machine has multiple cores. Now your memory capacity isn't so big after all, the system starts paging to disk, and your compiler performance falls off a cliff and the web browser is sluggish too. In this memory-constrained environment, trying to use less time (skipping GC) and more space (accumulating garbage) has backfired badly.

Space is time. An optimization that is faster on a well-endowed device may be much slower everywhere else. Assume your customer's machines have less memory than yours, and design and test accordingly.

At one modern extreme, the iPhone has only 128 MB of memory. Ever seen iPhone Safari "forget" a web page and re-download it after you switched tabs or apps? The system ran out of memory and Safari had to throw the page away. On the iPhone, your favorite space-time tradeoff in your own program may sacrifice the user's web page, requiring a repeat download across a slow network. Good for your program, perhaps, but bad for the user.

Space is time. An optimization that makes your program faster may make the user's system slower overall. Play well with others.

Most of Mac OS X is compiled with -Os instead of -O3, to reduce code size. Mac OS X's memory allocator is slower than other allocators under some workloads, because it tries to avoid hoarding unused memory where other processes can't use it. Mac OS X uses dynamic shared libraries exclusively, then combines multiple shared libraries into a single shared cache, then carefully re-processes that shared cache, all to save space across multiple processes. Many ideas for faster cross-library calls or accelerated Objective-C method dispatch or JIT-based optimization have been abandoned because they need too much space and do not save enough time.

CPU-focused optimization can be just as evil as the infamous premature optimization. Space is time.

archive

seal! Greg Parker
gparker-web@sealiesoftware.com
Sealie Software