On the SBCL PPC calling convention, and its implications for GC. * Basics Cross-product effect for VOPs. Functions are named or anonymous (fdefinition object or function object). Calls may be of fixed arity or variable arity (funcall vs. apply). There may be one expected return value, multiple expected return values, or a tail-call may be performed. Some commonalities with respect to frame allocation, register args, lexenv, code, and fdefn registers, etc. Certain "functional" objects (notably closures and funcallable instances) have trampolines to load the real function from the object and jump to it. The important registers as far as the GC is concerned are reg_LEXENV, reg_CODE, reg_LRA, reg_FDEFN, and reg_LIP. We can worry about argument-passing and frame allocation later. * Outward control flow The first divergance is named vs. anonymous. Named functions load reg_FDEFN with the fdefinition object. Anonymous functions load reg_LEXENV with the given function object, then load the "true" function into a temporary register from the closure-fun-slot of the given function (this is the closure fun slot, the simple-fun self slot, and the funcallable-instance trampoline slot). The entry-point is computed into reg_LIP, either by loading the fdefn-raw-addr-slot in the named case, or dead reckoning from the function object in the anonymous case. At some point during all of this, reg_LRA is either loaded (for tail calls) or calculated from reg_CODE. More on this later. Once the entry-point is known, it is transfered to the count register and then jumped to. A point here for the named case is that when the entry point is loaded into reg_LIP from the fdefn-raw-addr slot, a pointer to the enclosing object, either code component or function, is not stored in a register, thus violating the basic GC invariant on a live reg_LIP. ** The closure trampoline When a closure or funcallable-instance is named by an fdefinition, reg_LEXENV is not set up by the normal calling convention. In this case, the fdefn-raw-addr is pointed to a small code fragment in the runtime (ppc-assem.S) called closure_tramp. The closure trampoline is five instructions long. It starts by loading reg_LEXENV with the closure from the fdefn-fun slot of the fdefinition, then loads reg_CODE from the closure-fun slot of the closure, calculates the actual entry point, and jumps to the function by way of the count register. *** Should reg_LEXENV be set by the named calling convention? Argument in favor: reg_LEXENV is then a valid base for reg_LIP during call of non-closures. Calling a closure doesn't require a base for reg_LIP, in theory. For closures, we still use the trampoline, which reloads things anyway. Argument in disfavor: It's an extra load during the funcall sequence. Further elaboration: Once reg_LEXENV is loaded, we can use the "anonymous" calling sequence, which eliminates the use of the closure tramp. Argument in favor of using the "anonymous" calling sequence: fdefn update becomes atomic. Argument in disfavor of using the "anonymous" calling sequence: It's two extra loads during the funcall sequence. Alternative approach: Resurrect the trace-table, as the GC / fake_foreign_function_call can thus use it to sort out what's what during the function calling sequence. Missed consideration: We forgot about the undefined_tramp, which is paired with the fdefn-fun-slot being NULL. ** The funcallable instance trampoline When calling a funcallable-instance there is another step beyond the possible use of the closure trampoline. The "closure-fun" slot is actually the funcallable-instance-trampoline slot, and always points to a short, "fake" function in the runtime. This function, funcallable_instance_tramp, looks superficially like a real function (it has a dummied up function header), but no code component (similar to the undefined-function trampoline in that respect). The funcallable instance trampoline is five instructions long. It starts by loading reg_LEXENV with the "real" function from the funcallable-instance function slot from reg_LEXENV (implying that the named case goes through the closure trampoline first), then loads reg_FDEFN with the closure-fun slot of the function, calculates the actual entery point, and jumps to the function by way of the count register. The use of reg_LEXENV to hold both the funcallable-instance and the underlying function requires that the underlying function itself to be a closure in order for it to be able to find the funcallable-instance again. This brings to mind a question: * Is there any significance to the funcallable-instance tramp using reg_FDEFN instead of reg_CODE to hold the "real" function? If not, perhaps it -should- be reg_CODE, in order to simplify GC concerns for the program counter? Alternately, since it falls under the reg_LIP entry-point handling, perhaps not. ** GC considerations First, reg_CODE is never overwritten while the program-counter is within the body of the outbound (caller) code component. Second, the COUNT register is always equal to reg_LIP when it is valid. If it's not equal to reg_LIP then it's not live. Third, if the program counter is not valid in relation to reg_CODE, it is valid in relation to reg_LIP. Fourth, reg_LIP is only ever "unbased", thus subject to arbitrary motion with respect to it's destination, when using fdefn-raw-addr. Fifth, using the fdefn-raw-addr is an invitation to race conditions, not only with respect to reg_LIP, but also with respect to updating the fdefinition with a new function. Sixth, in order for reg_LIP to not be "unbased", the fdefn-fun must be loaded prior to the fdefn-raw-addr. This will do no harm if the function is undefined, a closure, or a funcallable instance (as it will go through the appropriate trampolines), and will have the function stored in a register to use as a base for reg_LIP. * Inward control flow During function calling, a "boxed" return-address object, known as an LRA, is passed in reg_LRA. This object is maintained relative to its enclosing code component. The LRA is a tagged (other-pointer) value, and thus aligned in memory. It is embedded within the instruction stream of a code-object. Its first word has return-pc-header-widetag, and its header data is the offset in words from the start of the code-object to the LRA header, allowing the GC to locate the code-object from the LRA. The subsequent words are part of the instruction stream. During function return, the LRA is again loaded into reg_LRA, then the actual entry point is computed into reg_LIP, the link register loaded from reg_LIP and then branched to. There is one LRA in the system with a header value of zero. This LRA is in call_into_lisp, and thus not subject to GC. ** GC considerations First, reg_CODE is never overwritten while the program-counter is within the body of the outbound (callee) code component. Second, the LINK register is equal to reg_LIP when it is valid (although, see the considerations for assembly-routines, below). Third, if the program counter is not valid in relation to reg_CODE, it is valid in relation to reg_LRA. Fourth, reg_LIP is never "unbased", it is always at some offset from reg_LRA, and within the body of the code-component in which reg_LRA is embedded. * Assembly-routines Assembler routines don't move (they're in the read-only space). Further, they preserve reg_CODE unless they do a tail-call to a support function. If they can do a tail-call then they use the full return convention (with an LRA object), so that falls under Inward control flow above. Otherwise, they either don't return, or use the link register, which is then valid relative to reg_CODE. ** GC considerations Beyond the same considerations as for Inward control flow for reg_LRA, the link register should be rebased as an interior pointer if it is within the scope of reg_CODE. * Overall GC considerations There are three registers which partake of the interior-pointer nature other than reg_LIP: the program counter, the link register, and the count register. First, the link register: When this is live, it is either an interior pointer for reg_CODE or it is equal to reg_LIP and an interior pointer for reg_LRA. Next, the count register: When this is live, it is equal to reg_LIP or contains the address of call_into_c. Finally, the program counter: This is usually an interior pointer for reg_CODE, but may be outside the heap space, an interior pointer for reg_LRA (during return processing), or a later interior pointer for whatever reg_LIP points to (during XEP processing). * STUPID SHIT ** When the fdefn-raw-addr is loaded into LIP, is the boxed pointer loaded anywhere? Otherwise, surely there's a window for the GC to screw things up? I took this out for GC purposes, didn't I? (Apparently not, it was never there.) ** fake_foreign_function_call() gets first crack at the interrupt context There is some odd logic involving reg_CODE having FUN_POINTER_LOWTAG, but nothing actually modifies reg_CODE. ** There is a write-ordering issue with setting fdefinition functions For "proper" atomic updates, the raw-addr slot must first be set to the closure trampoline, then the function slot updated, then the raw-addr slot may be reset to the "correct" raw-addr for a simple function. And all of this requires memory barriers, which we aren't going to put in the calling convention, thus leaving a hole: function redefinition is not atomic with respect to itself or to function calling. Alternately, change the calling convention to not use the raw-addr slot, and include a write barrier in the set-fdefn-fun sequence. ** Static functions are called directly loading their fdefn raw-addr Okay, WTF? This just /doesn't work/. Aside from preventing the use of closures or funcallable instances as static functions, it also ignores that static functions aren't stored in static space, only their fdefinition objects are.