
blog
recent
archive
twitter
projects
Mac OS X
CSC Menu
Valgrind
Fringe Player
pssh
Peal
Frankenmouse
|
|
Hamster Emporium
by Greg Parker, runtime wrangler and general specialist
|
[objc explain]: So you crashed in objc_msgSend(): iPhone Edition
(2009-06-08 11:40 PM)
|
|
|
So
you crashed in objc_msgSend() has been updated with register
usage for iPhone's ARM processor. The table now looks like this:
| objc_msgSend
objc_msgSend_fpret |
objc_msgSend_stret |
| | receiver | SEL | receiver | SEL |
| i386 | eax* | ecx |
eax* | ecx |
| x86_64 | rdi | rsi |
rsi | rdx |
| ppc | r3 | r4 |
r4 | r5 |
| ppc64 | r3 | r4 |
r4 | r5 |
| arm | r0 | r1 |
r1 | r2 |
|
|
Valgrind for Mac OS X goes mainline
(2009-06-03 7:20 PM)
|
|
|
Thanks to the heroic work of Nicholas Nethercote and Julian
Seward, the Mac OS X port of Valgrind is now
available on Valgrind's trunk. This is a big step forward
towards an official Valgrind release with Mac OS X support.
For those of you with Snow Leopard
seeds, Valgrind won't
work. Valgrind operates at the low-level unsupported guts of the
kernel/Libc interface. When the kernel and Libc change, Valgrind
needs to adapt or die. Valgrind support for Snow Leopard will not
be available until the open-source release of Snow Leopard's
kernel and Libc at the earliest, which in turn is not before Snow
Leopard itself ships.
|
|
[objc explain]: Monomorphic dispatch
(2009-05-14 8:04 PM)
|
|
|
Polymorphic dispatch means a single call site could branch to one
of several different implementations. C function calls are not
polymorphic; Objective-C methods and C++ virtual methods are
polymorphic.
The monomorphic dispatch optimization is used when a call site
could call different implementations in principle, but can only
ever call one particular implementation in reality. Then the
optimizer can eliminate the polymorphic dispatcher's overhead and
jump directly to the right place or even inline the callee
locally. This is a classic
optimization for dynamic-compiled runtimes from Smalltalk to Java
and JavaScript.
There are some complications. First is dynamically-loaded code
like shared libraries or eval() operations. If your new code
provides a second implementation of a call that was previously
monomorphic, you need to be able to undo the previous optimization
on the fly and recompile it or fall back to an interpreter. Any
dynamic compiler worth the name can do this nowadays.
Second, the area to search for additional implementations depends
on the type-strictness of your language. If the compile-time type
of the receiver rigidly defines the allowed runtime types, then
the implementation need only be unique within that part of the
hierarchy for the optimization to work. Window.title
and Employee.title wouldn't interfere with each other
at a strictly-typed call site whose receiver is of type
Employee (or a subclass thereof).
How does Objective-C fit in here? In general, it doesn't. The
monomorphic optimization is hard to apply to Objective-C. The two
problems above loom large because of the language's definition,
even if it suddenly acquired a runtime recompiler tomorrow.
Objective-C's call sites are not type-strict. The code may say the
receiver is of some type, but at runtime it could actually be a
Distributed Objects proxy or a unit test mock or a scripting
bridge shim. You'd have to look at all classes to decide if a
selector has multiple implementations, instead of searching only a
subtree of the class hierarchy.
Even worse, there are zero selectors that have only a single
implementation. They all have at least two: the one that exists in
some class, and the one from every other class that calls
-forwardInvocation:. You can never jump directly to any
implementation, because if your receiver object is of the wrong
type then you need to call the forwarding machinery instead. And
checking the receiver's type quickly eats any optimization profit;
you can only make a handful of checks before your cost is the
same as objc_msgSend().
There are some important cases where monomorphic dispatch would
still work in Objective-C. The container classes in particular
have only one or two real implementations, so a receiver type
check could be fast enough. And in other places you can make a single
relatively expensive type check but then re-use the result many
times, such as a series of [self ...] calls. The
tricky part is identifying which selectors and call sites would
optimize well, without taking too much time or memory to do so.
The monomorphic dispatch optimization will be present in some
future dynamic-recompiling Objective-C runtime, but it won't work
as well as it does in other less-dynamic languages.
|
|
[objc explain]: Classes and metaclasses
(2009-04-14 08:35 PM)
|
|
|
Objective-C is a class-based object system. Each object is an
instance of some class; the object's isa pointer
points to its class. That class describes the object's data:
allocation size and ivar types and layout. The class also
describes the object's behavior: the selectors it responds to and
instance methods it implements.
The class's method list is the set of instance methods, the
selectors that the object responds to. When
you send a message to an instance, objc_msgSend()
looks through the method list of that object's class (and
superclasses, if any) to decide what method to call.
Each Objective-C class is also an object. It has an
isa pointer and other data, and can respond to
selectors. When you call a "class method" like [NSObject
alloc], you are actually sending a message to that class
object.
Since a class is an object, it must be an instance of some other
class: a metaclass. The metaclass is the description of the class
object, just like the class is the description of ordinary
instances. In particular, the metaclass's method list is the
class methods: the selectors that the class object responds
to. When you send a message to a class - an instance of a
metaclass - objc_msgSend() looks through the method
list of the metaclass (and its superclasses, if any) to decide
what method to call. Class methods are described by the metaclass
on behalf of the class object, just like instance methods are
described by the class on behalf of the instance objects.
What about the metaclass? Is it metaclasses all the way down?
No. A metaclass is an instance of the root class's metaclass; the
root metaclass is itself an instance of the root metaclass. The
isa chain ends in a cycle here: instance to class to
metaclass to root metaclass to itself. The behavior of metaclass
isa pointers rarely matters, since in the real world
nobody sends messages to metaclass objects.
More important is the superclass of
a metaclass. The metaclass's superclass chain parallels the
class's superclass chain, so class methods are inherited in
parallel with instance methods. And the root metaclass's
superclass is the root class, so each class object responds to the
root class's instance methods. In the end, a class object is an
instance of (a subclass of) the root class, just like any other
object.
Confused? The diagram may help. Remember, when a message is sent
to any object, the method lookup starts with that object's
isa pointer, then continues up the superclass
chain. "Instance methods" are defined by the class, and "class
methods" are defined by the metaclass plus the root (non-meta) class.
In proper computer science
language theory, a class and metaclass hierarchy can be more
free-form, with deeper metaclass chains and multiple classes
instantiated from any single metaclass. Objective-C uses metaclasses
for practical goals like class methods, but otherwise tends to
hide metaclasses. For example, [NSObject class] is
identical to [NSObject self], even though in formal
terms it ought to return the metaclass that
NSObject->isa points to. The Objective-C language is
a set of practical compromises; here it limits the class schema
before it gets too, well, meta.
|
|
[objc explain]: Non-fragile ivars
(2009-01-27 09:30 PM)
|
|
|
Non-fragile instance variables are a headline feature of the
modern Objective-C runtime available on iPhone and 64-bit
Mac. They provide framework developers more flexibility without
losing binary compatibility, and pave the way for
automatically-synthesized property ivars and ivars declared
outside a class's interface.
The fragile base class problem
Fragile ivars are a subset of the classic fragile base
class problem. In some languages, a superclass cannot be
changed without also recompiling all subclasses of that class. For
example, adding data members or virtual member functions to a C++
superclass will break binary compatibility with any subclass of
that class, even if the added members are private and invisible to
the subclass.
In classic Objective-C, methods are mostly
non-fragile, thanks to dynamic message dispatch. You can freely
add methods to a superclass, as long
as you don't have name conflicts. But Objective-C ivars are
fragile on 32-bit Mac.
32-bit Mac: fragile Objective-C ivars
Say you're writing the world's next great pet shop
application for Mac OS X Leopard. You might have this
PetShopView subclass of
NSView, with arrays for the puppies and kittens in
the pet shop.
NSView (Leopard) |
0 |
Class isa |
4 |
NSRect bounds |
20 |
NSView *superview |
24 |
NSColor *bgColor |
|
PetShopView |
0 |
Class isa |
4 |
NSRect bounds |
20 |
NSView *superview |
24 |
NSColor *bgColor |
28 |
NSArray *kittens |
32 |
NSArray *puppies |
|
Then Mac OS X Def Leopard comes out, with its new multi-paw
interface technology. The AppKit developers add some paw-tracking
code to NSView.
NSView (Def Leopard) |
0 |
Class isa |
4 |
NSRect bounds |
20 |
NSView *superview |
24 |
NSColor *bgColor |
28 |
NSSet *touchedPaws |
|
PetShopView |
0 |
Class isa |
4 |
NSRect bounds |
20 |
NSView *superview |
24 |
NSColor *bgColor |
28 |
NSArray *kittens |
32 |
NSArray *puppies |
|
Unfortunately, your kittens are doomed by fragile
ivars. Alternatively, the AppKit developers are trapped with
whatever ivars they chose in Mac OS X 10.0.
iPhone and 64-bit Mac: non-fragile Objective-C ivars
What you and the AppKit developers really want is something like
this.
NSView (Def Leopard) |
0 |
Class isa |
4 |
NSRect bounds |
20 |
NSView *superview |
24 |
NSColor *bgColor |
28 |
NSSet *touchedPaws |
|
PetShopView |
0 |
Class isa |
4 |
NSRect bounds |
20 |
NSView *superview |
24 |
NSColor *bgColor |
28 |
NSSet *touchedPaws |
32 |
NSArray *kittens |
36 |
NSArray *puppies |
|
Here, the runtime has recognized that NSView is now
larger than it was when PetShopView was compiled. The
subclass ivars slide in response, without recompiling any code,
and the kittens are saved by a dynamic runtime.
How it works
The generated code for classic Objective-C ivar access works like
a C struct field. The offset to the ivar is a
constant determined at compile time. The new ivar code instead
creates a variable for each ivar which contains the offset to that
ivar, and all code that accesses the ivar uses that
variable. At launch time, the runtime can change any ivar offset
variable if it detects an oversize superclass.
In the pet shop example,
_OBJC_IVAR_PetShopView_kittens is 28 at
compile time,
but the runtime changes it to 32 when it sees the Def Leopard
version of NSView. No code needs to be recompiled,
and the performance overhead of the extra ivar offset variable is
small. AppKit is happy, you're happy, and the kittens are happy.
|
|
Infinity isn't as long as you think
(2008-11-30 3:29 AM)
|
|
|
My intuition about infinite time used to go something like this:
every possible event will happen, if you wait long enough.
Consider a random walk on a 2-D grid. Will the drunkard return to
the starting point? Yes, if you wait long enough: you can prove
that the random walk will, eventually, return. (Formally, it
returns "almost surely" or "with probability 1". For example, the
random walk could proceed due east forever and never return; it's
not impossible, just infinitely improbable.)
In fact, you can prove the same thing about every other point on
the grid, not just the start. Each point will ("almost surely") be
visited during the random walk. If you wait long enough, the
drunkard will wander everywhere. Score a point for the "everything
happens eventually" idea.
But that intuition is wrong. Forever is not always enough.
Move the drunkard to a 3-D grid. Now the probability of returning
to the start is not "almost sure". Instead, it's a little over
one-third. If you start three drunkards on a 3-D random walk, you
would expect only one of them to ever come back, even if you wait
forever. And the odds are even worse with more dimensions.
Some events still don't happen after infinite time. Infinity isn't
as long as you think.
http://en.wikipedia.org/wiki/Random_walk
http://mathworld.wolfram.com/PolyasRandomWalkConstants.html
|
|
[objc explain]: objc_msgSend_fpret
(2008-11-16 7:00 PM)
|
|
|
objc_msgSend is the Objective-C message
dispatcher. objc_msgSend_stret is exactly the same,
but for methods that return values of struct types. And
objc_msgSend_fpret is for methods that return
floating-point values on some architectures.
| | objc_msgSend_fpret return types |
| i386 |
float, double, long double |
| x86_64 |
long double |
| ppc |
none (identical to
objc_msgSend) |
| ppc64 |
| arm |
objc_msgSend_stret exists because the parameters are
passed in different places for struct-returning functions. That's
not the problem that objc_msgSend_fpret
solves. Instead, objc_msgSend_fpret exists to handle
the return value itself correctly. Specifically, the return value
of a message sent to nil on i386 and x86_64.
Messages to nil return zero if possible. On ppc,
objc_msgSend with a nil receiver clears registers r3,
r4, f1, and f2 before returning. This means any pointer or integer
or floating-point return value will be zero, and structs are
undefined. On ppc, objc_msgSend_fpret is unnecessary,
because clearing f1 and f2 is harmless if the caller is actually
expecting a value in r3 or r4.
i386 is different. The floating-point registers there are
historically derived from the 8087 FPU for the original 8086
CPU. The x87 unit is an odd beast: it has eight floating-point
registers, but the registers themselves are used as if they were a
stack. Values are pushed and popped from this stack even though
the stack is stored in registers and has a maximum of eight
entries.
For return values, the callee pushes the value on the x87
stack and the caller pops it. This is fine for C functions, but
not so good when objc_msgSend returns
zero. objc_msgSend does not know what return type the
caller really wants, and it must not push a zero on the x87
stack unless the caller expects to pop a floating-point value.
The solution is objc_msgSend_fpret. During a message
to nil on i386, objc_msgSend_fpret pushes a zero on
the x87 stack which the caller will pop, and
objc_msgSend does not. The caller knows which return
type the caller expects, and uses the matching dispatcher. On ppc,
ppc64, and arm, objc_msgSend_fpret is identical to
objc_msgSend and is usually unused.
What about x86_64? This architecture is one step forward,
two steps back. The good news is that return values of types
float and double are returned in the XMM
registers. objc_msgSend can handle that itself just
like ppc. The bad news is that long double still uses
the x87 stack. So objc_msgSend_fpret still exists,
but is only used for long double on x86_64. The worse
news is that C99's complex long double returns two
values on the x87 stack. So now there's
objc_msgSend_fp2ret for that case only. Currently no
compiler actually uses objc_msgSend_fp2ret, so
hopefully nobody is writing code that sends a message to nil on
x86_64 and expects a zero return value of type complex long
double.
One last thing: Mac OS X 10.4 and earlier did not return zero for
floating-point messages to nil on ppc. Be careful if you're
writing code for those older systems. All other architectures have
always returned floating-point zero, including i386 on 10.4.
|
|
[objc explain]: objc_msgSend_stret
(2008-10-30 10:28 PM)
|
|
|
objc_msgSend is the Objective-C message
dispatcher. It's the function-calling function, using selector and
the receiver object's class to decide where to jump
to. objc_msgSend_stret is exactly the same, but for
methods that return values of struct types. Why does
objc_msgSend_stret exist?
Because of the machine-level guts of the C language require it,
and Objective-C methods are C functions if you tilt your head and
squint a bit.
On most processors, the first few parameters to a function are
passed in CPU registers, and return values are handed back in CPU
registers. Objective-C
methods do the same, but with id self and SEL
_cmd as the first two parameters. Here's a PowerPC example:
-(int) method:(id)arg;
r3 = self
r4 = _cmd, @selector(method:)
r5 = arg
(on exit) r3 = returned int
CPU registers work fine for small return values like ints and
pointers, but structure values can be too big to fit. For structs,
the caller allocates stack space for the returned struct,
passes the address of that storage to the function, and the
function writes its return value into that space. The address of
the struct is an implicit first parameter just like
self and _cmd:
-(struct st) method:(id)arg;
r3 = &struct_var (in caller's stack frame)
r4 = self
r5 = _cmd, @selector(method:)
r6 = arg
(on exit) return value written into struct_var
Now consider objc_msgSend's task. It uses
_cmd and self->isa to choose the
destination. But self and _cmd are in
different registers if the method will return a struct, and
objc_msgSend can't tell that in advance. Thus
objc_msgSend_stret: just like
objc_msgSend, but reading its values from different
registers.
But there's a catch.
On most architectures, some small C structs are returned in registers
after all, instead of using the struct-address first parameter
that objc_msgSend_stret expects. If the struct type
falls into this category, then objc_msgSend is used
instead. So the "struct return" part of
objc_msgSend_stret refers to the architecture's
definition of a stack-returned struct, which may not match C struct.
The rules for which struct types return in registers are always
arcane, sometimes insane. ppc32 is trivial: structs never return in
registers. i386 is straightforward: structs with
sizeof exactly equal to 1, 2, 4, or 8 return in
registers. x86_64 is more complicated, including rules for
returning floating-point struct fields in FPU registers, and
ppc64's rules and exceptions will make your head spin. The gory
details are documented in the Mac
OS X ABI Guide, though as usual if the documentation and the
compiler disagree then the documentation is wrong.
If you're calling objc_msgSend directly and need to
know whether to use objc_msgSend_stret for a
particular struct type, I recommend the empirical approach: write
a line of code that calls your method, compile it on each
architecture you care about, and look at the
assembly code to see which dispatch function the compiler
uses.
|
|
updated: Valgrind for Mac OS X
(2008-10-27 5:47 PM)
|
|
|
Update for Valgrind for Mac OS
X: fixes a hang at launch when reading debug info.
The entire difference between valgrind-opensource-3 and
valgrind-opensource-4 is:
-#if !defined(VGO_DARWIN)
+#if !defined(VGO_darwin)
D'oh!
|
|
Space is time: how your CS theory class lied to you
(2008-10-14 1:04 AM)
|
|
|
In your computer science algorithms course, you learned about
space-time tradeoffs. An algorithm that requires lots of time can
often be changed to take less time but more space. A wide range of
performance optimizations work this way, from caching to
memoization to loop unrolling.
But the "tradeoff" is a lie. Space is time.
Every use of space incurs a time cost. In
your theory class, the time cost of space is swept under the big-O
rug. On
your big-iron machine running a single computational workload or
CPU benchmark, the time cost of space is small compared to the
other time costs involved. But in the real world of consumer-grade
devices, with limited memory and power, the time cost of space is
tremendous. A performance optimization that tries to trade less
time for more space often ends up requiring more time and more
space.
The gcc compiler uses a garbage collector to manage
its memory. To save time, gcc does not even start to
collect any garbage until its memory size is quite large.
"That's fine", you might say, "my new
machine has gigabytes of memory". But the kernel needs a
big chunk of memory just to keep track of the rest of the
memory. And you're running a web browser, and an email client, and
an IDE, and music and chat and clock and search and sync and backup
and everything else you didn't have a decade or two ago.
And your build system runs multiple
gcc commands in parallel because your machine has
multiple cores. Now your memory capacity isn't so big after all,
the system starts paging to disk, and your compiler performance
falls off a cliff and the web browser is sluggish
too. In this memory-constrained environment, trying to use less
time (skipping GC) and more space (accumulating garbage) has
backfired badly.
Space is time. An optimization that is faster on a
well-endowed device may be much slower everywhere else. Assume
your customer's machines have less memory than yours, and design
and test accordingly.
At one modern extreme, the iPhone has only 128 MB of
memory. Ever seen iPhone Safari "forget" a web page and
re-download it after you switched tabs or apps? The system ran out
of memory and Safari had to throw the page away. On the iPhone,
your favorite space-time tradeoff in your own program may
sacrifice the user's web page, requiring a repeat download across
a slow network. Good for your program, perhaps, but bad for the
user.
Space is time. An
optimization that makes your program faster may
make the user's system slower overall. Play well with others.
Most of Mac OS X is compiled with -Os instead of
-O3, to reduce code size. Mac OS X's memory allocator
is slower than other allocators under some workloads, because it
tries to avoid hoarding unused memory where other processes can't
use it. Mac OS X uses dynamic shared libraries exclusively, then
combines multiple shared libraries into a single shared cache,
then carefully re-processes that shared cache, all to save space
across multiple processes.
Many ideas for faster cross-library calls or accelerated
Objective-C method dispatch or JIT-based optimization have been
abandoned because they need too much space and do not save enough time.
CPU-focused optimization can be just as evil as the infamous
premature optimization. Space is time.
|
archive
|