blog
recent
archive
twitter

projects
Mac OS X
Keyboard
  backlight
CSC Menu
Valgrind
Fringe Player
pssh
Peal
Frankenmouse

   

Hamster Emporium

by Greg Parker, runtime wrangler and general specialist

(link) Do-it-yourself Objective-C weak import   (2010-4-8 10:23 PM)
 

WARNING DANGER HAZARD BEWARE EEK

The scheme described herein is UNTESTED and probably BUGGY. Use at your own risk.

Executive summary

The Objective-C runtime supports weak-imported classes back to iPhone OS 3.1. An app could use a class added in iPhone OS 3.2 or 4.0 and still run on 3.1. The app would check if [SomeClass class] is nil and act accordingly.

Unfortunately, the compilers and class declarations in framework headers do not support weak import yet. But you may be able to use weak linking anyway, by adding the right incantations yourself.

To use a class SomeClass that is unavailable on some of your app's deployment targets, write this in every file that uses the class:

    asm(".weak_reference _OBJC_CLASS_$_SomeClass");
To subclass a class SomeClass that is unavailable on some of your app's deployment targets, write this in the file containing your subclass's @implementation:
    asm(".weak_reference _OBJC_CLASS_$_SomeClass");
    asm(".weak_reference _OBJC_METACLASS_$_SomeClass");
This will not work for apps running on iPhone OS 3.0 or older. Only iPhone OS 3.1 and newer has any hope of success. Of course, since this is UNTESTED it may not work there either.

How it works

Say you're writing a game, and want to use the hypothetical UIDancePad class added to iPhone OS 3.2. (Do not dance on iPad.) When you use class UIDancePad in your code, the compiler emits a C symbol pointing to the class:

    .long _OBJC_CLASS_$_UIDancePad

Since UIDancePad is in a framework instead of your code, the symbol remains undefined in your executable, as shown by `nm -m`:

    (undefined) external _OBJC_CLASS_$_UIDancePad (from DanceKit)

When you run on iPhone OS 3.2, everything works great: the dynamic loader opens your executable and DanceKit, and binds your undefined symbol to their class definition.

Things don't go so well on iPhone OS 3.1. DanceKit exists but does not define UIDancePad. The dynamic loader is unable to resolve your undefined symbol, and the process halts:

    dyld: Symbol not found: _OBJC_CLASS_$_UIDancePad
        Referenced from: /path/to/YourApp
        Expected in: /path/to/DanceKit

Weak import solves this. The compiled symbol reference is now a weak one:

    .weak_reference _OBJC_CLASS_$_UIDancePad
    .long _OBJC_CLASS_$_UIDancePad

    (undefined) weak external _OBJC_CLASS_$_UIDancePad (from DanceKit)

The dynamic loader shrugs its shoulders if a weak reference cannot be resolved, and sets the pointer to NULL. The Objective-C runtime sees the NULL pointer and fixes up the rest of the metadata as if UIDancePad never existed.

As mentioned above, the compiler and framework header support is not yet in place. The incantations simply add the assembler directives that the compiler does not yet know how to emit:

    asm(".weak_reference _OBJC_CLASS_$_UIDancePad");

Et voilà: weak import of an Objective-C class. Well, maybe. I have only tested this on toy examples, none of which got anywhere close to any version of iPhone OS. Coder beware!

(What about the _OBJC_METACLASS symbol, you ask? When you subclass a class, your subclass's metaclass's superclass pointer points to the subclass's superclass's metaclass. In other words, your subclass's @implementation points to both its superclass and its superclass's metaclass. That requires two symbols: one for the class and one for the metaclass. When you simply use a class without subclassing it, you don't need the metaclass pointer.)

(link) [objc explain]: Weak-import classes   (2009-09-09 1:30 PM)
 

Weak-import classes are a useful new Objective-C feature that you can't use yet.

Weak import is a solution when you want to use something from a framework, but still need to be compatible with older versions of the framework that didn't support it yet. Using weak import you can test if the feature exists at runtime before you try to use it.

Objective-C has not previously supported weak import for classes. Instead you had to use clumsy runtime introspection to check whether a class was available, store a pointer to that class in a variable, and use that variable when you wanted to send a message to the class. Even worse, there was no reasonable way to create your own subclass of a superclass that might be unavailable. Some developers put the subclass in a separate library that was not loaded until after checking that the superclass was present, but even that trick is not allowed on iPhone OS.

Weak import for C functions works by checking the weak-imported function pointer's value before calling it:

    if (NSNewFunction != NULL) {
        NSNewFunction(...);
    } else {
        // NSNewFunction not supported on this system
    }

The same mechanism is a natural fit with Objective-C classes and Objective-C's handling of messages to nil. These constructs are much nicer than NSClassFromString() or a separate NSBundle.

    if ([NSNewClass class] != nil) {
        [NSNewClass doSomething];
    } else {
        // NSNewClass is unavailable on this system
    }
    @interface MySubclass : NSNewClass ... @end
    MySubclass *obj = [[MySubclass alloc] init];
    if (!obj) {
        // MySubclass (or a superclass thereof) is unavailable on this system
    }

Weak import of Objective-C classes is now available. But you can't use it yet. First, it's only supported today on iPhone OS 3.1; it's expected to arrive in a future Mac OS.

Second, there's nothing you can do with weak import until the first OS update after iPhone OS 3.1. Then you could write an app that adopted new features in that future version, and used weak import to be compatible with 3.1. (It still could not run on 3.0 or 2.x, because those systems lack the runtime machinery to process the weak import references.)

Weak import for Objective-C did not make Snow Leopard for scheduling reasons. Assuming it ships in Mac OS X 10.7 Cat Name Forthcoming, you won't be able to use it until Mac OS X 10.8 LOLcat.

(link) Colorized keyboard backlight   (2009-09-05 1:15 AM)
 

I use my MacBook Pro for astronomy. The backlit keyboard would be great in the dark, but its white light is bad for night vision. This mod makes it red, or any other color you want.

(link) [objc explain]: Selector uniquing in the dyld shared cache   (2009-09-01 2:10 AM)
 

Mac OS X Snow Leopard cuts in half the launch-time overhead of starting the Objective-C runtime, and simultaneously saves a few hundred KB of memory per app. This comes for free to every app, courtesy of one of the few pieces of Mac OS X that lives below even the Objective-C runtime: dyld.

dyld and the shared cache

dyld is the dynamic loader and linker. When your process starts, dyld loads your executable and its shared libraries into memory, links the cross-library C function and variable references together, and starts execution on its way towards main().

In theory a shared library could be different every time your program is run. In practice, you get the same version of the shared libraries almost every time you run, and so does every other process on the system. The system takes advantage of this by building the dyld shared cache. The shared cache contains a copy of many system libraries, with most of dyld's linking and loading work done in advance. Every process can then share that shared cache, saving memory and launch time.

(Incidentally, the shared cache beats the pants off the pre-Leopard prebinding system that was supposed to achieve the same optimizations. Remember the post-install "Optimizing System Performance" step that often took longer than the install itself? That was prebinding being updated. Rebuilding the shared cache is so blazingly fast that the installer doesn't bother to report it anymore.)

Objective-C selector uniquing

Leopard's dyld shared cache is great for C code, but it didn't do anything to help Objective-C's startup overhead. The single biggest launch cost for Objective-C is selector uniquing. The app and every shared library contain their own copies of selector names like "alloc" and "init". The runtime needs to choose a single canonical SEL pointer value for each selector name, and then update the metadata for every call site and method list to use the blessed unique value. This means building a big hash table (memory), calling strcmp() a lot (time), and modifying copy-on-write metadata (more memory).

There are tens of thousands of unique selectors present in a typical process. If you run `strings /usr/lib/libobjc.dylib` on Leopard you can see the thirty-thousand-line built-in selector table that was a previous attempt to reduce the memory cost. Even so the cost goes up with every new class and method added to Cocoa.framework; left unchecked, an identical app would take longer to launch and use more memory after every OS upgrade.

The obvious solution? Do the work of selector uniquing in the dyld shared cache. Build a selector table into the shared cache itself, and update the selector references in the cached copy of the shared libraries. Then you save memory because every process shares the same selector table, and save time because the runtime does not need to rebuild it during every app launch. The runtime only needs to fix the selector references from the app itself. The catch? Selectors are too dynamic to be implemented as C symbols, so the shared cache construction tool needed to be taught how to read and write Objective-C's metadata.

Optimization WIN

Snow Leopard's dyld shared cache uniques Objective-C selectors, and Snow Leopard's Objective-C runtime recognizes when the selectors in a shared library are already uniqued courtesy of the shared cache. About half of the runtime's initialization time is eliminated, making warm app launch several tenths of a second faster. Typical memory savings is 200-500 KB per process, adding up to a few megabytes system-wide. When this optimization ships on the iPhone OS side, it's estimated to save 1 MB on a 128 MB device. The iPhone performance team would pay any number of arms and legs for that kind of gain.

You can watch the system in action with various debugging flags.

$ sudo /usr/bin/update_dyld_shared_cache -debug -verify
[...]
update_dyld_shared_cache: for x86_64, uniquing objc selectors
update_dyld_shared_cache: for x86_64, found 68761 unique objc selectors
update_dyld_shared_cache: for x86_64, 541736/590908 bytes (91%) used in libobjc unique selector section
update_dyld_shared_cache: for x86_64, updated 205230 selector references

$ OBJC_PRINT_PREOPTIMIZATION=YES /usr/bin/defaults
objc[424]: PREOPTIMIZATION: selector preoptimization ENABLED (version 3)
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /usr/lib/libobjc.A.dylib
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Metadata.framework/Versions/A/Metadata
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation

You can estimate the memory savings with the allmemory tool. Record post-launch memory usage of an app run with and without environment variable OBJC_DISABLE_PREOPTIMIZATION=YES. Look for the count of dirty pages; each dirty page is 4 KB eaten by that process. With 64-bit TextEdit I see the dirty page count jump from 725 to 1069 after disabling the optimization. This is an overestimate - many of those pages would have been not-dirty in Leopard because of the old built-in selector table - but it does show the magnitude of the win.

The Objective-C runtime does more than just selector uniquing during launch. Future improvements to the dyld shared cache may precompute some of that other work, to further improve launch time, save memory, and reduce the cost of linking to Objective-C code that you don't actually use. But selector uniquing as seen in Snow Leopard is by far the biggest bang for the buck.

(link) [objc explain]: Thread-local garbage collection   (2009-08-28 1:00 PM)
 

Mac OS X Snow Leopard introduces thread-local collection, a big enhancement to the Objective-C garbage collector. Thread-local collection is a more efficient way to reclaim much of the garbage in most programs. It also scales better to more threads and more cores than the other algorithms used by the Objective-C GC.

A brief history of garbage collection

The simplest algorithm used by the Objective-C GC is full collection. The GC scans all live objects in the entire heap, discovers (approximately) all dead objects, and reclaims them. This is slow, especially if you have a large population of not-dead objects, but does find all possible garbage. Historically this is mostly 1960-era technology, except for the machinery that allows other threads to run mostly unhindered while the scan completes.

The second algorithm used by the Objective-C GC is generational collection. This takes advantage of the generational hypothesis: most objects die young. The heap is divided into at least two generations: new objects and old objects. After allocating some amount of new objects, the collector runs a generational collection. First, it scans only the "new" objects and any "old" objects into which a pointer to a "new" object was stored. Then the now-dead "new" objects are reclaimed, and the surviving "new" objects are aged, moving them to the next generation. The advantage of generational GC is that it collects lots of garbage (most objects die young) with much less work (it does not need to scan most of the "old" objects). Full collections are still needed to reclaim objects that survive infancy and die later, but run less often. Generational collection is 1980-era technology.

Thread-local collection

The third algorithm is the new thread-local collection. TLC is similar to generational collection: it scans and reclaims a subset of objects, trying to get lots of bang for the collector's buck. The thread-local hypothesis: most objects die without being reachable by any other thread. Newly-allocated objects are marked thread-local to the allocating thread. If a thread-local object becomes accessible to another thread (for example, a pointer to it is written into a global variable), then it has "escaped" and is moved out of the thread-local set. In a thread-local collection, a thread scans its own stack and its set of thread-local objects, and reclaims the dead objects.

The advantage of thread-local collection is that it requires no synchronization with other threads. Normally, a thread performing GC work needs to coordinate with the other threads. For example, the other threads change a pointer variable after the GC thread has looked at it, or start pointing to an object that the GC thread thinks is dead. Thread-local collection avoids these complications. The thread-local objects are by definition reachable only by one thread. The other threads have no way acquire a pointer to any of those objects, or change pointer values inside them. A thread performing thread-local collection can work quickly on its own, without interference from other threads.

Having each thread "clean up after itself" reduces bottlenecks in the collector that will only get worse as threads and cores increase. It's trivial to run thread-local collections on multiple threads simultaneously. And it's very fast because the only memory to scan is the thread's own stack and its surviving thread-local objects. The pause time for a thread during generational or full GC is almost as big as the pause time for that thread to run a thread-local collection - but TLC can then immediately reclaim some garbage, whereas the other algorithms need to do much more work and coordinate with all of the other threads before they can actually reclaim anything.

How you can help

Thread-local collection works best when objects remain unreachable to other threads. In the Objective-C collector, this means avoiding CFRetain() of temporary objects when possible. A CFRetained pointer could go anywhere behind the collector's back, bypassing the write barrier that the collector uses to keep track of escaping objects. (This is one place that Snow Leopard leaves room for improvement: the system frameworks often allocate objects with a CF retain count of one and immediately release them, making them ineligible for thread-local collection.) Other ways for an object to escape thread-local collection include storing a pointer into a global variable; storing a pointer into some other object that itself is not thread-local; and making a weak reference or associated reference to the object.

If your thread has just created and discarded a lot of temporary objects, you can give the collector a hint that now might be a good time to run. -[NSGarbageCollector collectIfNeeded] and -[NSAutoreleasePool drain] are two such hints. These may run a thread-local collection first, and may follow up with generational or full collection.

(link) [objc explain]: So you crashed in objc_msgSend(): iPhone Edition   (2009-06-08 11:40 PM)
 

So you crashed in objc_msgSend() has been updated with register usage for iPhone's ARM processor. The table now looks like this:

objc_msgSend
objc_msgSend_fpret
objc_msgSend_stret
 receiverSELreceiverSEL
i386eax*ecx eax*ecx
x86_64rdirsi rsirdx
ppcr3r4 r4r5
ppc64r3r4 r4r5
armr0r1 r1r2

(link) Valgrind for Mac OS X goes mainline   (2009-06-03 7:20 PM)
 

Thanks to the heroic work of Nicholas Nethercote and Julian Seward, the Mac OS X port of Valgrind is now available on Valgrind's trunk. This is a big step forward towards an official Valgrind release with Mac OS X support.

For those of you with Snow Leopard seeds, Valgrind won't work. Valgrind operates at the low-level unsupported guts of the kernel/Libc interface. When the kernel and Libc change, Valgrind needs to adapt or die. Valgrind support for Snow Leopard will not be available until the open-source release of Snow Leopard's kernel and Libc at the earliest, which in turn is not before Snow Leopard itself ships.

(link) [objc explain]: Monomorphic dispatch   (2009-05-14 8:04 PM)
 

Polymorphic dispatch means a single call site could branch to one of several different implementations. C function calls are not polymorphic; Objective-C methods and C++ virtual methods are polymorphic.

The monomorphic dispatch optimization is used when a call site could call different implementations in principle, but can only ever call one particular implementation in reality. Then the optimizer can eliminate the polymorphic dispatcher's overhead and jump directly to the right place or even inline the callee locally. This is a classic optimization for dynamic-compiled runtimes from Smalltalk to Java and JavaScript.

There are some complications. First is dynamically-loaded code like shared libraries or eval() operations. If your new code provides a second implementation of a call that was previously monomorphic, you need to be able to undo the previous optimization on the fly and recompile it or fall back to an interpreter. Any dynamic compiler worth the name can do this nowadays.

Second, the area to search for additional implementations depends on the type-strictness of your language. If the compile-time type of the receiver rigidly defines the allowed runtime types, then the implementation need only be unique within that part of the hierarchy for the optimization to work. Window.title and Employee.title wouldn't interfere with each other at a strictly-typed call site whose receiver is of type Employee (or a subclass thereof).

How does Objective-C fit in here? In general, it doesn't. The monomorphic optimization is hard to apply to Objective-C. The two problems above loom large because of the language's definition, even if it suddenly acquired a runtime recompiler tomorrow.

Objective-C's call sites are not type-strict. The code may say the receiver is of some type, but at runtime it could actually be a Distributed Objects proxy or a unit test mock or a scripting bridge shim. You'd have to look at all classes to decide if a selector has multiple implementations, instead of searching only a subtree of the class hierarchy.

Even worse, there are zero selectors that have only a single implementation. They all have at least two: the one that exists in some class, and the one from every other class that calls -forwardInvocation:. You can never jump directly to any implementation, because if your receiver object is of the wrong type then you need to call the forwarding machinery instead. And checking the receiver's type quickly eats any optimization profit; you can only make a handful of checks before your cost is the same as objc_msgSend().

There are some important cases where monomorphic dispatch would still work in Objective-C. The container classes in particular have only one or two real implementations, so a receiver type check could be fast enough. And in other places you can make a single relatively expensive type check but then re-use the result many times, such as a series of [self ...] calls. The tricky part is identifying which selectors and call sites would optimize well, without taking too much time or memory to do so.

The monomorphic dispatch optimization will be present in some future dynamic-recompiling Objective-C runtime, but it won't work as well as it does in other less-dynamic languages.

(link) [objc explain]: Classes and metaclasses   (2009-04-14 08:35 PM)
 

Objective-C is a class-based object system. Each object is an instance of some class; the object's isa pointer points to its class. That class describes the object's data: allocation size and ivar types and layout. The class also describes the object's behavior: the selectors it responds to and instance methods it implements.

The class's method list is the set of instance methods, the selectors that the object responds to. When you send a message to an instance, objc_msgSend() looks through the method list of that object's class (and superclasses, if any) to decide what method to call.

Each Objective-C class is also an object. It has an isa pointer and other data, and can respond to selectors. When you call a "class method" like [NSObject alloc], you are actually sending a message to that class object.

Since a class is an object, it must be an instance of some other class: a metaclass. The metaclass is the description of the class object, just like the class is the description of ordinary instances. In particular, the metaclass's method list is the class methods: the selectors that the class object responds to. When you send a message to a class - an instance of a metaclass - objc_msgSend() looks through the method list of the metaclass (and its superclasses, if any) to decide what method to call. Class methods are described by the metaclass on behalf of the class object, just like instance methods are described by the class on behalf of the instance objects.

What about the metaclass? Is it metaclasses all the way down? No. A metaclass is an instance of the root class's metaclass; the root metaclass is itself an instance of the root metaclass. The isa chain ends in a cycle here: instance to class to metaclass to root metaclass to itself. The behavior of metaclass isa pointers rarely matters, since in the real world nobody sends messages to metaclass objects.

More important is the superclass of a metaclass. The metaclass's superclass chain parallels the class's superclass chain, so class methods are inherited in parallel with instance methods. And the root metaclass's superclass is the root class, so each class object responds to the root class's instance methods. In the end, a class object is an instance of (a subclass of) the root class, just like any other object.

Confused? The diagram may help. Remember, when a message is sent to any object, the method lookup starts with that object's isa pointer, then continues up the superclass chain. "Instance methods" are defined by the class, and "class methods" are defined by the metaclass plus the root (non-meta) class.

In proper computer science language theory, a class and metaclass hierarchy can be more free-form, with deeper metaclass chains and multiple classes instantiated from any single metaclass. Objective-C uses metaclasses for practical goals like class methods, but otherwise tends to hide metaclasses. For example, [NSObject class] is identical to [NSObject self], even though in formal terms it ought to return the metaclass that NSObject->isa points to. The Objective-C language is a set of practical compromises; here it limits the class schema before it gets too, well, meta.

(link) [objc explain]: Non-fragile ivars   (2009-01-27 09:30 PM)
 

Non-fragile instance variables are a headline feature of the modern Objective-C runtime available on iPhone and 64-bit Mac. They provide framework developers more flexibility without losing binary compatibility, and pave the way for automatically-synthesized property ivars and ivars declared outside a class's interface.

The fragile base class problem

Fragile ivars are a subset of the classic fragile base class problem. In some languages, a superclass cannot be changed without also recompiling all subclasses of that class. For example, adding data members or virtual member functions to a C++ superclass will break binary compatibility with any subclass of that class, even if the added members are private and invisible to the subclass.

In classic Objective-C, methods are mostly non-fragile, thanks to dynamic message dispatch. You can freely add methods to a superclass, as long as you don't have name conflicts. But Objective-C ivars are fragile on 32-bit Mac.

32-bit Mac: fragile Objective-C ivars

Say you're writing the world's next great pet shop application for Mac OS X Leopard. You might have this PetShopView subclass of NSView, with arrays for the puppies and kittens in the pet shop.

NSView (Leopard)
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
PetShopView
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
28 NSArray *kittens
32 NSArray *puppies

Then Mac OS X Def Leopard comes out, with its new multi-paw interface technology. The AppKit developers add some paw-tracking code to NSView.

NSView (Def Leopard)
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
28 NSSet *touchedPaws
PetShopView
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
28 NSArray *kittens
32 NSArray *puppies

Unfortunately, your kittens are doomed by fragile ivars. Alternatively, the AppKit developers are trapped with whatever ivars they chose in Mac OS X 10.0.

iPhone and 64-bit Mac: non-fragile Objective-C ivars

What you and the AppKit developers really want is something like this.

NSView (Def Leopard)
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
28 NSSet *touchedPaws
PetShopView
0 Class isa
4 NSRect bounds
20 NSView *superview
24 NSColor *bgColor
28 NSSet *touchedPaws
32 NSArray *kittens
36 NSArray *puppies

Here, the runtime has recognized that NSView is now larger than it was when PetShopView was compiled. The subclass ivars slide in response, without recompiling any code, and the kittens are saved by a dynamic runtime.

How it works

The generated code for classic Objective-C ivar access works like a C struct field. The offset to the ivar is a constant determined at compile time. The new ivar code instead creates a variable for each ivar which contains the offset to that ivar, and all code that accesses the ivar uses that variable. At launch time, the runtime can change any ivar offset variable if it detects an oversize superclass.

In the pet shop example, _OBJC_IVAR_PetShopView_kittens is 28 at compile time, but the runtime changes it to 32 when it sees the Def Leopard version of NSView. No code needs to be recompiled, and the performance overhead of the extra ivar offset variable is small. AppKit is happy, you're happy, and the kittens are happy.

archive

seal! Greg Parker
gparker-web@sealiesoftware.com
Sealie Software