On the SBCL PPC calling convention, and its implications for GC.

* Basics

Cross-product effect for VOPs.  Functions are named or anonymous
(fdefinition object or function object).  Calls may be of fixed arity
or variable arity (funcall vs. apply).  There may be one expected
return value, multiple expected return values, or a tail-call may be
performed.

Some commonalities with respect to frame allocation, register args,
lexenv, code, and fdefn registers, etc.

Certain "functional" objects (notably closures and funcallable
instances) have trampolines to load the real function from the object
and jump to it.

The important registers as far as the GC is concerned are reg_LEXENV,
reg_CODE, reg_LRA, reg_FDEFN, and reg_LIP.

We can worry about argument-passing and frame allocation later.

* Outward control flow

The first divergance is named vs. anonymous.  Named functions load
reg_FDEFN with the fdefinition object.  Anonymous functions load
reg_LEXENV with the given function object, then load the "true"
function into a temporary register from the closure-fun-slot of the
given function (this is the closure fun slot, the simple-fun self
slot, and the funcallable-instance trampoline slot).

The entry-point is computed into reg_LIP, either by loading the
fdefn-raw-addr-slot in the named case, or dead reckoning from the
function object in the anonymous case.

At some point during all of this, reg_LRA is either loaded (for tail
calls) or calculated from reg_CODE.  More on this later.

Once the entry-point is known, it is transfered to the count register
and then jumped to.

A point here for the named case is that when the entry point is loaded
into reg_LIP from the fdefn-raw-addr slot, a pointer to the enclosing
object, either code component or function, is not stored in a
register, thus violating the basic GC invariant on a live reg_LIP.

** The closure trampoline

When a closure or funcallable-instance is named by an fdefinition,
reg_LEXENV is not set up by the normal calling convention.  In this
case, the fdefn-raw-addr is pointed to a small code fragment in the
runtime (ppc-assem.S) called closure_tramp.

The closure trampoline is five instructions long.  It starts by
loading reg_LEXENV with the closure from the fdefn-fun slot of the
fdefinition, then loads reg_CODE from the closure-fun slot of the
closure, calculates the actual entry point, and jumps to the function
by way of the count register.

*** Should reg_LEXENV be set by the named calling convention?

Argument in favor: reg_LEXENV is then a valid base for reg_LIP during
call of non-closures.  Calling a closure doesn't require a base for
reg_LIP, in theory.  For closures, we still use the trampoline, which
reloads things anyway.

Argument in disfavor: It's an extra load during the funcall
sequence.

Further elaboration: Once reg_LEXENV is loaded, we can use the
"anonymous" calling sequence, which eliminates the use of the closure
tramp.

Argument in favor of using the "anonymous" calling sequence: fdefn
update becomes atomic.

Argument in disfavor of using the "anonymous" calling sequence: It's
two extra loads during the funcall sequence.

Alternative approach: Resurrect the trace-table, as the GC /
fake_foreign_function_call can thus use it to sort out what's what
during the function calling sequence.

Missed consideration: We forgot about the undefined_tramp, which is
paired with the fdefn-fun-slot being NULL.

** The funcallable instance trampoline

When calling a funcallable-instance there is another step beyond the
possible use of the closure trampoline.  The "closure-fun" slot is
actually the funcallable-instance-trampoline slot, and always points
to a short, "fake" function in the runtime.  This function,
funcallable_instance_tramp, looks superficially like a real function
(it has a dummied up function header), but no code component (similar
to the undefined-function trampoline in that respect).

The funcallable instance trampoline is five instructions long.  It
starts by loading reg_LEXENV with the "real" function from the
funcallable-instance function slot from reg_LEXENV (implying that the
named case goes through the closure trampoline first), then loads
reg_FDEFN with the closure-fun slot of the function, calculates the
actual entery point, and jumps to the function by way of the count
register.

The use of reg_LEXENV to hold both the funcallable-instance and the
underlying function requires that the underlying function itself to be
a closure in order for it to be able to find the funcallable-instance
again.

This brings to mind a question:

  * Is there any significance to the funcallable-instance tramp using
    reg_FDEFN instead of reg_CODE to hold the "real" function?

    If not, perhaps it -should- be reg_CODE, in order to simplify GC
    concerns for the program counter?

    Alternately, since it falls under the reg_LIP entry-point
    handling, perhaps not.

** GC considerations

First, reg_CODE is never overwritten while the program-counter is
within the body of the outbound (caller) code component.

Second, the COUNT register is always equal to reg_LIP when it is
valid.  If it's not equal to reg_LIP then it's not live.

Third, if the program counter is not valid in relation to reg_CODE,
it is valid in relation to reg_LIP.

Fourth, reg_LIP is only ever "unbased", thus subject to arbitrary
motion with respect to it's destination, when using fdefn-raw-addr.

Fifth, using the fdefn-raw-addr is an invitation to race conditions,
not only with respect to reg_LIP, but also with respect to updating
the fdefinition with a new function.

Sixth, in order for reg_LIP to not be "unbased", the fdefn-fun must be
loaded prior to the fdefn-raw-addr.  This will do no harm if the
function is undefined, a closure, or a funcallable instance (as it
will go through the appropriate trampolines), and will have the
function stored in a register to use as a base for reg_LIP.

* Inward control flow

During function calling, a "boxed" return-address object, known as an
LRA, is passed in reg_LRA.  This object is maintained relative to its
enclosing code component.

The LRA is a tagged (other-pointer) value, and thus aligned in memory.
It is embedded within the instruction stream of a code-object.  Its
first word has return-pc-header-widetag, and its header data is the
offset in words from the start of the code-object to the LRA header,
allowing the GC to locate the code-object from the LRA.  The
subsequent words are part of the instruction stream.

During function return, the LRA is again loaded into reg_LRA, then the
actual entry point is computed into reg_LIP, the link register loaded
from reg_LIP and then branched to.

There is one LRA in the system with a header value of zero.  This LRA
is in call_into_lisp, and thus not subject to GC.

** GC considerations

First, reg_CODE is never overwritten while the program-counter is
within the body of the outbound (callee) code component.

Second, the LINK register is equal to reg_LIP when it is valid
(although, see the considerations for assembly-routines, below).

Third, if the program counter is not valid in relation to reg_CODE,
it is valid in relation to reg_LRA.

Fourth, reg_LIP is never "unbased", it is always at some offset from
reg_LRA, and within the body of the code-component in which reg_LRA is
embedded.

* Assembly-routines

Assembler routines don't move (they're in the read-only space).
Further, they preserve reg_CODE unless they do a tail-call to a
support function.  If they can do a tail-call then they use the full
return convention (with an LRA object), so that falls under Inward
control flow above.  Otherwise, they either don't return, or use the
link register, which is then valid relative to reg_CODE.

** GC considerations

Beyond the same considerations as for Inward control flow for reg_LRA,
the link register should be rebased as an interior pointer if it is
within the scope of reg_CODE.

* Overall GC considerations

There are three registers which partake of the interior-pointer nature
other than reg_LIP: the program counter, the link register, and the
count register.

First, the link register: When this is live, it is either an interior
pointer for reg_CODE or it is equal to reg_LIP and an interior pointer
for reg_LRA.

Next, the count register: When this is live, it is equal to reg_LIP or
contains the address of call_into_c.

Finally, the program counter: This is usually an interior pointer for
reg_CODE, but may be outside the heap space, an interior pointer for
reg_LRA (during return processing), or a later interior pointer for
whatever reg_LIP points to (during XEP processing).

* STUPID SHIT

** When the fdefn-raw-addr is loaded into LIP, is the boxed pointer loaded anywhere?

Otherwise, surely there's a window for the GC to screw things up?  I
took this out for GC purposes, didn't I? (Apparently not, it was never
there.)

** fake_foreign_function_call() gets first crack at the interrupt context

There is some odd logic involving reg_CODE having FUN_POINTER_LOWTAG,
but nothing actually modifies reg_CODE.

** There is a write-ordering issue with setting fdefinition functions

For "proper" atomic updates, the raw-addr slot must first be set to
the closure trampoline, then the function slot updated, then the
raw-addr slot may be reset to the "correct" raw-addr for a simple
function.

And all of this requires memory barriers, which we aren't going to put
in the calling convention, thus leaving a hole: function redefinition
is not atomic with respect to itself or to function calling.

Alternately, change the calling convention to not use the raw-addr
slot, and include a write barrier in the set-fdefn-fun sequence.

** Static functions are called directly loading their fdefn raw-addr

Okay, WTF?  This just /doesn't work/.  Aside from preventing the use
of closures or funcallable instances as static functions, it also
ignores that static functions aren't stored in static space, only
their fdefinition objects are.