blog
recent
archive
twitter

projects
Mac OS X
Keyboard
  backlight
CSC Menu
Valgrind
Fringe Player
pssh
Peal
Frankenmouse

   

Hamster Emporium

by Greg Parker, runtime wrangler and general specialist

(link) [objc explain]: return value of message to nil   (2012-2-29 2:40 PM)
 

LLVM Compiler 3.0 (Xcode 4.2) or later

Integers up to 64 bits: 0
Floating-point up to long double: 0.0
Pointers: nil
Structs: {0}
Any _Complex type: {0, 0}

Notes

C++ objects returned by value are initialized to {0}, even if the type has a default constructor that does something else. This may be fixed in the future.
Struct return is undefined if you call objc_msgSend_stret() directly.
Struct return is undefined if you use an older compiler.
Floating-point return is undefined on Mac OS X 10.4 and earlier on Power PC.
_Complex long double return is undefined if you use an older compiler.

(link) [objc explain]: objc_msgSend_vtable   (2011-06-17 4:42 PM)
 

objc_msgSend_vtable is a version of objc_msgSend used to optimize a few of the most commonly called methods.

Most Objective-C methods are dispatched using a hash table lookup inside objc_msgSend. On x86_64, a few selectors can be dispatched using a C++-style virtual table: an array lookup, not a hash table.

The compiler knows which selectors are optimized by the runtime. It compiles the call site differently, calling objc_msgSend_fixup via a function pointer. At runtime, objc_msgSend_fixup replaces the function pointer with one of the objc_msgSend_vtable functions, if the called selector is one of the optimized selectors.

C++ vtables are notoriously fragile: the array offsets for each virtual method are hardcoded into the generated code. Objective-C's vtables are not fragile. Each vtable is built at runtime and updated when method lists change. In theory even the set of optimized methods could be changed. The non-fragile flexibility costs an extra memory load during dispatch.

Dispatch via vtable is faster than a hash table, but would consume tremendous amounts of memory if used everywhere. Objective-C's vtable implementation limits its use to a few selectors that are (1) implemented everywhere, but (2) rarely overridden. That means most classes share their superclass's vtable, which keeps memory costs low.

A crash in any objc_msgSend_vtable function should be debugged exactly like a crash in objc_msgSend itself. They both crash for all of the same reasons, like incorrect memory management or memory smashers.

Currently, the runtime uses sixteen different objc_msgSend_vtable functions, one for each slot in the sixteen-entry vtable.

objc_msgSend_vtable0allocWithZone:
objc_msgSend_vtable1alloc
objc_msgSend_vtable2class
objc_msgSend_vtable3self
objc_msgSend_vtable4isKindOfClass:
objc_msgSend_vtable5respondsToSelector:
objc_msgSend_vtable6isFlipped
objc_msgSend_vtable7length
objc_msgSend_vtable8objectForKey:
objc_msgSend_vtable9count
objc_msgSend_vtable10objectAtIndex:
objc_msgSend_vtable11isEqualToString:
objc_msgSend_vtable12isEqual:
objc_msgSend_vtable13retain (non-GC)
hash (GC)
objc_msgSend_vtable14release (non-GC)
addObject: (GC)
objc_msgSend_vtable15autorelease (non-GC)
countByEnumeratingWithState:objects:count: (GC)

The vtable's contents differ for GC and non-GC, for obvious reasons. -isFlipped is part of NSView. -countByEnumeratingWithState:objects:count: is the fast enumeration implementation, including for (x in y). Together these methods make up roughly 30-50% of calls in typical Objective-C applications.

(link) Dr. Gregory Parker, Department of Diagnostic Engineering   (2010-09-01 3:15 AM)
 

Last week, Rick Ballard came by my office for a consult. He had caught Xcode at a crash in objc_msgSend(). The crash looked like an intermittent problem that had been plaguing Xcode for months. So he called the local expert on debugging objc_msgSend(). Dr. Gregory Parker, Department of Diagnostic Engineering.

The good news was that Rick's crash was reliably reproducible. Running tests on a live patient is better than performing an autopsy on a dead one. The bad news was that the obvious debugging tools had not helped. NSZombieEnabled and guardmalloc had turned up nothing, and AUTO_USE_GUARDS=YES (the GC equivalent of guardmalloc) just thrashed the machine for two hours before running out of address space.

So you crashed in objc_msgSend(). The selector was -isAbsolutePath, which was reasonable but meant the debugger's backtrace was missing a frame. objc_msgSend() had read the class from the object, read the method cache from the class, read a method from the method cache, and crashed while trying to read the IMP from the method. Theory: either one of those data structures had been hit by a memory smasher, or the original object was bogus but happened to have dereferenceable pointers in the right places to survive that long. The method cache's mask was invalid - it should have been of the form 2n-1 - so the failure must have been at or before that point in the chain.

The object pointer itself looked plausible. Theory: the object was valid, but a previous object at the same location had been used after being freed. We had the great luxury of a reproducible crash, so we turned on MallocStackLoggingNoCompact and ran it again. That memory had only been used for one object, and it had not been deallocated. So the evidence did not support the use-after-free theory. But the history showed that the object had been allocated as an NSPathStore2 - an internal subclass of NSString for file pathnames - which matched the selector -isAbsolutePath and matched the call site's expectations. The theory that the object pointer was valid looked good.

The object pointer was good, and the method cache was not: the failure was on the chain between them. The contents of the object looked good. The bytes looked like alternating zero and ASCII, which is a dead giveaway for the UTF-16 used inside NSString. The string value decoded as @"/Xcode4/usr/bin/llvm-gcc", which made sense in the call site's context.

The object's isa pointer was not so good. Its value was 0xa0050000. This was not class NSPathStore2 or any other class. vmmap showed it to be in Foundation's data segment, and otool showed it was specifically in Foundation's constant CF strings. But instead of pointing to the start of some string, it pointed to the middle of a string object. That string object was @"tzm-Latn": some localization thingy, perhaps? Theory: some bug had replaced this object's isa pointer with a pointer to the middle of an unrelated localization string object. This did not sound like a good theory.

Go back to the board. Symptom: the object was allocated as an NSPathStore2. Symptom: the object's isa pointer is now 0xa0050000, which is not NSPathStore2. What should the isa pointer's value have been? otool and objc_getClass() agreed: the correct isa pointer should have been 0xa005f198. 0xa0050000 is suspiciously similar. Theory: something had cleared two bytes of this object, leaving a nonsense isa pointer. @"tzm-Latn" was a red herring.

Aha! This is 32-bit i386. Little endian. The pointer 0xa005f198 is stored backwards in memory: 0x98 0xf1 0x05 0xa0. Clearing the least-significant bytes of the isa pointer meant clearing bytes 0 and 1 of the object, not bytes 2 and 3. Damage to bytes 0 and 1 is exactly what you'd expect from a two-byte overrun of the object preceding this one in memory. Theory: the bug was in that preceding object, and this NSPathStore2 object was an innocent victim.

malloc_history works with a pointer to the middle of an allocation, too. We plugged in object-1 and got back an instance of DVTSourceModelItem, not deallocated. Rick recognized this as part of Xcode's indexer, which was always running in another thread at the time of the crash. A buffer overrun from the DVTSourceModelItem object fit the symptoms.

But where was the buffer? I had expected an overrun in some heap-allocated C array, not an ordinary object. Nor did DVTSourceModelItem have any C arrays in its instance variables.

Theory: the compiler or runtime had allocated too little memory for the instance of class DVTSourceModelItem, and ordinary ivar access had overrun that allocation. It was a long shot, but easy to test. malloc_size() and class_getInstanceSize() and an eyeball count of ivars all agreed that the object was 32 bytes. Theory disproved.

We tested the overrun theory again. Add an unused ivar to the end of DVTSourceModelItem, recompile, and run it. No crash. Remove the ivar. Crash. The extra ivar "fixed" the bug. The buffer overrun theory still fit the evidence, but we couldn't find it.

No more ideas. We needed data. Debugger watchpoints were out: there were thousands of instances of DVTSourceModelItem, and we couldn't watch two bytes after each of them. We were not yet desperate enough to try brute force code inspection. AUTO_USE_GUARDS=YES could catch it, if it didn't fall over first. Since we had a suspect in mind, we could play the guardmalloc trick ourselves with a narrower target. Override +[DVTSourceModelItem alloc], mprotect() the page after the allocation, and cross our fingers really hard hoping that it still reproduced after changing the timing so much.

Bang! It crashed (good) somewhere new (also good). DVTSourceModelItem -init was writing to one of its own instance variables. The ivar was a bit in a bitfield, and that bitfield was at the end of the ivar list.

Disassemble. The generated code read 4 bytes around the bit into a register, change the bit in that register, and wrote the 4 bytes back to memory. That's typical for a bitfield. The unexpected part was that the 4 bytes spanned the last two bytes of the object and the first two bytes after the object. That's a bug. Most of the time the out of bounds access is invalid - it reads two bytes it shouldn't, and writes back the same value. But if there's another thread it can crash:

Thread 1Thread 2
reads four bytes, including two
bytes outside the object
 
 allocates a new object
 writes an isa pointer
writes four bytes, clobbering the
new value written by Thread 2
 
 crashes

Theory: a compiler bug generated bad code for DVTSourceModelItem's bitfield ivar, causing a read-modify-write out of bounds by two bytes, which corrupted memory in other threads. Test: try a different compiler. DVTSourceModelItem.m was built with clang, so we recompiled with llvm-gcc. No crash, and the disassembly looked correct. Compile with clang again, crash again.

Diagnosis: clang compiler bug in bitfield ivars. The patient's symptoms were treated with an extra ivar in DVTSourceModelItem until a compiler transplant could be performed.

Elapsed time: about three hours. Too long for an episode of a TV procedural drama, unfortunately.

(link) TargetConditionals.h   (2010-8-16 2:30 PM)
 
Mac OS XiOS deviceiOS simulator
TARGET_OS_MAC111
TARGET_OS_IPHONE011
TARGET_OS_EMBEDDED010
TARGET_IPHONE_SIMULATOR001

(link) Do-it-yourself Objective-C weak import   (2010-4-8 10:23 PM)
 

WARNING DANGER HAZARD BEWARE EEK

The scheme described herein is UNTESTED and probably BUGGY. Use at your own risk.

Executive summary

The Objective-C runtime supports weak-imported classes back to iPhone OS 3.1. An app could use a class added in iPhone OS 3.2 or 4.0 and still run on 3.1. The app would check if [SomeClass class] is nil and act accordingly.

Unfortunately, the compilers and class declarations in framework headers do not support weak import yet. But you may be able to use weak linking anyway, by adding the right incantations yourself.

To use a class SomeClass that is unavailable on some of your app's deployment targets, write this in every file that uses the class:

    asm(".weak_reference _OBJC_CLASS_$_SomeClass");
To subclass a class SomeClass that is unavailable on some of your app's deployment targets, write this in the file containing your subclass's @implementation:
    asm(".weak_reference _OBJC_CLASS_$_SomeClass");
    asm(".weak_reference _OBJC_METACLASS_$_SomeClass");
This will not work for apps running on iPhone OS 3.0 or older. Only iPhone OS 3.1 and newer has any hope of success. Of course, since this is UNTESTED it may not work there either.

How it works

Say you're writing a game, and want to use the hypothetical UIDancePad class added to iPhone OS 3.2. (Do not dance on iPad.) When you use class UIDancePad in your code, the compiler emits a C symbol pointing to the class:

    .long _OBJC_CLASS_$_UIDancePad

Since UIDancePad is in a framework instead of your code, the symbol remains undefined in your executable, as shown by `nm -m`:

    (undefined) external _OBJC_CLASS_$_UIDancePad (from DanceKit)

When you run on iPhone OS 3.2, everything works great: the dynamic loader opens your executable and DanceKit, and binds your undefined symbol to their class definition.

Things don't go so well on iPhone OS 3.1. DanceKit exists but does not define UIDancePad. The dynamic loader is unable to resolve your undefined symbol, and the process halts:

    dyld: Symbol not found: _OBJC_CLASS_$_UIDancePad
        Referenced from: /path/to/YourApp
        Expected in: /path/to/DanceKit

Weak import solves this. The compiled symbol reference is now a weak one:

    .weak_reference _OBJC_CLASS_$_UIDancePad
    .long _OBJC_CLASS_$_UIDancePad

    (undefined) weak external _OBJC_CLASS_$_UIDancePad (from DanceKit)

The dynamic loader shrugs its shoulders if a weak reference cannot be resolved, and sets the pointer to NULL. The Objective-C runtime sees the NULL pointer and fixes up the rest of the metadata as if UIDancePad never existed.

As mentioned above, the compiler and framework header support is not yet in place. The incantations simply add the assembler directives that the compiler does not yet know how to emit:

    asm(".weak_reference _OBJC_CLASS_$_UIDancePad");

Et voilà: weak import of an Objective-C class. Well, maybe. I have only tested this on toy examples, none of which got anywhere close to any version of iPhone OS. Coder beware!

(What about the _OBJC_METACLASS symbol, you ask? When you subclass a class, your subclass's metaclass's superclass pointer points to the subclass's superclass's metaclass. In other words, your subclass's @implementation points to both its superclass and its superclass's metaclass. That requires two symbols: one for the class and one for the metaclass. When you simply use a class without subclassing it, you don't need the metaclass pointer.)

(link) [objc explain]: Weak-import classes   (2009-09-09 1:30 PM)
 

Weak-import classes are a useful new Objective-C feature that you can't use yet.

Weak import is a solution when you want to use something from a framework, but still need to be compatible with older versions of the framework that didn't support it yet. Using weak import you can test if the feature exists at runtime before you try to use it.

Objective-C has not previously supported weak import for classes. Instead you had to use clumsy runtime introspection to check whether a class was available, store a pointer to that class in a variable, and use that variable when you wanted to send a message to the class. Even worse, there was no reasonable way to create your own subclass of a superclass that might be unavailable. Some developers put the subclass in a separate library that was not loaded until after checking that the superclass was present, but even that trick is not allowed on iPhone OS.

Weak import for C functions works by checking the weak-imported function pointer's value before calling it:

    if (NSNewFunction != NULL) {
        NSNewFunction(...);
    } else {
        // NSNewFunction not supported on this system
    }

The same mechanism is a natural fit with Objective-C classes and Objective-C's handling of messages to nil. These constructs are much nicer than NSClassFromString() or a separate NSBundle.

    if ([NSNewClass class] != nil) {
        [NSNewClass doSomething];
    } else {
        // NSNewClass is unavailable on this system
    }
    @interface MySubclass : NSNewClass ... @end
    MySubclass *obj = [[MySubclass alloc] init];
    if (!obj) {
        // MySubclass (or a superclass thereof) is unavailable on this system
    }

Weak import of Objective-C classes is now available. But you can't use it yet. First, it's only supported today on iPhone OS 3.1; it's expected to arrive in a future Mac OS.

Second, there's nothing you can do with weak import until the first OS update after iPhone OS 3.1. Then you could write an app that adopted new features in that future version, and used weak import to be compatible with 3.1. (It still could not run on 3.0 or 2.x, because those systems lack the runtime machinery to process the weak import references.)

Weak import for Objective-C did not make Snow Leopard for scheduling reasons. Assuming it ships in Mac OS X 10.7 Cat Name Forthcoming, you won't be able to use it until Mac OS X 10.8 LOLcat.

(link) Colorized keyboard backlight   (2009-09-05 1:15 AM)
 

I use my MacBook Pro for astronomy. The backlit keyboard would be great in the dark, but its white light is bad for night vision. This mod makes it red, or any other color you want.

(link) [objc explain]: Selector uniquing in the dyld shared cache   (2009-09-01 2:10 AM)
 

Mac OS X Snow Leopard cuts in half the launch-time overhead of starting the Objective-C runtime, and simultaneously saves a few hundred KB of memory per app. This comes for free to every app, courtesy of one of the few pieces of Mac OS X that lives below even the Objective-C runtime: dyld.

dyld and the shared cache

dyld is the dynamic loader and linker. When your process starts, dyld loads your executable and its shared libraries into memory, links the cross-library C function and variable references together, and starts execution on its way towards main().

In theory a shared library could be different every time your program is run. In practice, you get the same version of the shared libraries almost every time you run, and so does every other process on the system. The system takes advantage of this by building the dyld shared cache. The shared cache contains a copy of many system libraries, with most of dyld's linking and loading work done in advance. Every process can then share that shared cache, saving memory and launch time.

(Incidentally, the shared cache beats the pants off the pre-Leopard prebinding system that was supposed to achieve the same optimizations. Remember the post-install "Optimizing System Performance" step that often took longer than the install itself? That was prebinding being updated. Rebuilding the shared cache is so blazingly fast that the installer doesn't bother to report it anymore.)

Objective-C selector uniquing

Leopard's dyld shared cache is great for C code, but it didn't do anything to help Objective-C's startup overhead. The single biggest launch cost for Objective-C is selector uniquing. The app and every shared library contain their own copies of selector names like "alloc" and "init". The runtime needs to choose a single canonical SEL pointer value for each selector name, and then update the metadata for every call site and method list to use the blessed unique value. This means building a big hash table (memory), calling strcmp() a lot (time), and modifying copy-on-write metadata (more memory).

There are tens of thousands of unique selectors present in a typical process. If you run `strings /usr/lib/libobjc.dylib` on Leopard you can see the thirty-thousand-line built-in selector table that was a previous attempt to reduce the memory cost. Even so the cost goes up with every new class and method added to Cocoa.framework; left unchecked, an identical app would take longer to launch and use more memory after every OS upgrade.

The obvious solution? Do the work of selector uniquing in the dyld shared cache. Build a selector table into the shared cache itself, and update the selector references in the cached copy of the shared libraries. Then you save memory because every process shares the same selector table, and save time because the runtime does not need to rebuild it during every app launch. The runtime only needs to fix the selector references from the app itself. The catch? Selectors are too dynamic to be implemented as C symbols, so the shared cache construction tool needed to be taught how to read and write Objective-C's metadata.

Optimization WIN

Snow Leopard's dyld shared cache uniques Objective-C selectors, and Snow Leopard's Objective-C runtime recognizes when the selectors in a shared library are already uniqued courtesy of the shared cache. About half of the runtime's initialization time is eliminated, making warm app launch several tenths of a second faster. Typical memory savings is 200-500 KB per process, adding up to a few megabytes system-wide. When this optimization ships on the iPhone OS side, it's estimated to save 1 MB on a 128 MB device. The iPhone performance team would pay any number of arms and legs for that kind of gain.

You can watch the system in action with various debugging flags.

$ sudo /usr/bin/update_dyld_shared_cache -debug -verify
[...]
update_dyld_shared_cache: for x86_64, uniquing objc selectors
update_dyld_shared_cache: for x86_64, found 68761 unique objc selectors
update_dyld_shared_cache: for x86_64, 541736/590908 bytes (91%) used in libobjc unique selector section
update_dyld_shared_cache: for x86_64, updated 205230 selector references

$ OBJC_PRINT_PREOPTIMIZATION=YES /usr/bin/defaults
objc[424]: PREOPTIMIZATION: selector preoptimization ENABLED (version 3)
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /usr/lib/libobjc.A.dylib
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/CoreFoundation.framework/Versions/A/CoreFoundation
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/CoreServices.framework/Versions/A/Frameworks/Metadata.framework/Versions/A/Metadata
objc[424]: PREOPTIMIZATION: honoring preoptimized selectors in /System/Library/Frameworks/Foundation.framework/Versions/C/Foundation

You can estimate the memory savings with the allmemory tool. Record post-launch memory usage of an app run with and without environment variable OBJC_DISABLE_PREOPTIMIZATION=YES. Look for the count of dirty pages; each dirty page is 4 KB eaten by that process. With 64-bit TextEdit I see the dirty page count jump from 725 to 1069 after disabling the optimization. This is an overestimate - many of those pages would have been not-dirty in Leopard because of the old built-in selector table - but it does show the magnitude of the win.

The Objective-C runtime does more than just selector uniquing during launch. Future improvements to the dyld shared cache may precompute some of that other work, to further improve launch time, save memory, and reduce the cost of linking to Objective-C code that you don't actually use. But selector uniquing as seen in Snow Leopard is by far the biggest bang for the buck.

(link) [objc explain]: Thread-local garbage collection   (2009-08-28 1:00 PM)
 

Mac OS X Snow Leopard introduces thread-local collection, a big enhancement to the Objective-C garbage collector. Thread-local collection is a more efficient way to reclaim much of the garbage in most programs. It also scales better to more threads and more cores than the other algorithms used by the Objective-C GC.

A brief history of garbage collection

The simplest algorithm used by the Objective-C GC is full collection. The GC scans all live objects in the entire heap, discovers (approximately) all dead objects, and reclaims them. This is slow, especially if you have a large population of not-dead objects, but does find all possible garbage. Historically this is mostly 1960-era technology, except for the machinery that allows other threads to run mostly unhindered while the scan completes.

The second algorithm used by the Objective-C GC is generational collection. This takes advantage of the generational hypothesis: most objects die young. The heap is divided into at least two generations: new objects and old objects. After allocating some amount of new objects, the collector runs a generational collection. First, it scans only the "new" objects and any "old" objects into which a pointer to a "new" object was stored. Then the now-dead "new" objects are reclaimed, and the surviving "new" objects are aged, moving them to the next generation. The advantage of generational GC is that it collects lots of garbage (most objects die young) with much less work (it does not need to scan most of the "old" objects). Full collections are still needed to reclaim objects that survive infancy and die later, but run less often. Generational collection is 1980-era technology.

Thread-local collection

The third algorithm is the new thread-local collection. TLC is similar to generational collection: it scans and reclaims a subset of objects, trying to get lots of bang for the collector's buck. The thread-local hypothesis: most objects die without being reachable by any other thread. Newly-allocated objects are marked thread-local to the allocating thread. If a thread-local object becomes accessible to another thread (for example, a pointer to it is written into a global variable), then it has "escaped" and is moved out of the thread-local set. In a thread-local collection, a thread scans its own stack and its set of thread-local objects, and reclaims the dead objects.

The advantage of thread-local collection is that it requires no synchronization with other threads. Normally, a thread performing GC work needs to coordinate with the other threads. For example, the other threads change a pointer variable after the GC thread has looked at it, or start pointing to an object that the GC thread thinks is dead. Thread-local collection avoids these complications. The thread-local objects are by definition reachable only by one thread. The other threads have no way acquire a pointer to any of those objects, or change pointer values inside them. A thread performing thread-local collection can work quickly on its own, without interference from other threads.

Having each thread "clean up after itself" reduces bottlenecks in the collector that will only get worse as threads and cores increase. It's trivial to run thread-local collections on multiple threads simultaneously. And it's very fast because the only memory to scan is the thread's own stack and its surviving thread-local objects. The pause time for a thread during generational or full GC is almost as big as the pause time for that thread to run a thread-local collection - but TLC can then immediately reclaim some garbage, whereas the other algorithms need to do much more work and coordinate with all of the other threads before they can actually reclaim anything.

How you can help

Thread-local collection works best when objects remain unreachable to other threads. In the Objective-C collector, this means avoiding CFRetain() of temporary objects when possible. A CFRetained pointer could go anywhere behind the collector's back, bypassing the write barrier that the collector uses to keep track of escaping objects. (This is one place that Snow Leopard leaves room for improvement: the system frameworks often allocate objects with a CF retain count of one and immediately release them, making them ineligible for thread-local collection.) Other ways for an object to escape thread-local collection include storing a pointer into a global variable; storing a pointer into some other object that itself is not thread-local; and making a weak reference or associated reference to the object.

If your thread has just created and discarded a lot of temporary objects, you can give the collector a hint that now might be a good time to run. -[NSGarbageCollector collectIfNeeded] and -[NSAutoreleasePool drain] are two such hints. These may run a thread-local collection first, and may follow up with generational or full collection.

(link) [objc explain]: So you crashed in objc_msgSend(): iPhone Edition   (2009-06-08 11:40 PM)
 

So you crashed in objc_msgSend() has been updated with register usage for iPhone's ARM processor. The table now looks like this:

objc_msgSend
objc_msgSend_fpret
objc_msgSend_stret
 receiverSELreceiverSEL
i386eax*ecx eax*ecx
x86_64rdirsi rsirdx
ppcr3r4 r4r5
ppc64r3r4 r4r5
armr0r1 r1r2

archive

seal! Greg Parker
gparker-web@sealiesoftware.com
Sealie Software