A Log of an Attempt to Port SBCL to Linux on the ARM Architecture By Alastair Bridgewater * 2007-Dec-28: Port begins. My port host is an x86-64 linux gentoo system. My port target is an nslu2 (intel xscale, ARMv5TE) running linux of some stripe (probably gentoo, need to re-check endianness anyway). Set up working directory under ~/src/lisp/sbcl/. I have a "pristine" tree, arm-port/sbcl-1.0.13-pristine/, which will not be touched during the port. I have a built linux/x86-64 tree, sbcl-1.0.13/, which serves as a template for what needs setting up, as make-target-config.sh isn't suited for a cross-build, and wouldn't be able to detect an arm system anyway. I also have my current working tree, arm-port/sbcl-1.0.13-arm5-pass1/, which will be where I do the initial hacking. uname -a claims "Linux hikari 2.6.16 #1 PREEMPT Fri Jun 9 07:34:31 PDT 2006 armv5teb unknown unknown GNU/Linux". Looks to be a SlugOS system, big-endian. Okay, downloaded ARM docs for ARMv5 and the ABI, and an ARM7TDMI manual to ~/src/lisp/sbcl/arm-port/cpu-docs/. The ARM has sixteen addressable registers in normal operation. A check on the various SBCL and CMUCL backends shows that the only comparable environments are the RT and OLD-RT backends for CMUCL. There are two options here in terms of GC support: We can use the cheney GC, or the generational GC. Or we can do the hard thing, and make it a build-time option. For reference, the CMUCL RT backends are both cheneygc, all but the x86 and x86-64 SBCL backends are cheneygc, and all but the x86 SBCL backend have 32 addressable registers (the x86 having 8). I suspect that it will be easier to start with a gencgc environment, especially as it is the environment required to support threading. First things first, add a tag to version.lisp-expr to indicate the ARM port. Next, we need to configure local-target-features.lisp-expr and build-id.txt. In the future, these will be automatically generated by make-config.sh, but for now we do them by hand. For l-t-f, we set :arm as our architecture, :unix, :elf and :linux because we're on a Linux system, :gencgc because we want the conservative collector, and nothing because we're on a big-endian system. (We'd put :little-endian here if we were on an LSB-first system, which seems asymmetrical, but what can you do?) We also specify :stack-grows-downward-not-upward and :c-stack-is-control-stack, as the former is specified by the procedure call standard, and the latter saves some stack switching. For build-id, we just copy the one from our built x86-64 tree. Next, we need to set up target directories and symlinks. The directories are src/assembly/arm/ and src/compiler/arm/. The symlinks are src/assembly/target, src/compiler/assembly and src/compiler/target. For the directories, we cp -r the version from the archetypal gencgc target, x86. We set the symlinks up just as they would be for a normal build. We now have enough of the environment configured to get into trouble. * 2007-Dec-29: A quick review of src/compiler/arm/parms.lisp shows that, while a good portion of it will need to be rewritten later, we don't need to modify it yet. The first part of the backend hacking required is to define the machine model and instruction definitions. At the very least, the register layout. src/compiler/arm/vm.lisp starts off with register definitions, followed by storage-base and storage-class definitions. A quick refresher may be in order here. A storage-base, or SB, is description of a set of places to store data, such as registers, a stack frame, a function's literal storage area (for immediate values), etc. A storage-class, or SC, is a description of a subset of an SB's locations with specific properties, such as where register values can be spilled if the registers need to be used for something else (if we want to store numbers on a separate stack from boxed data such as conses, for example) or if registers are callee-save, and so on. * 2007-Dec-30: There are 16 general-purpose registers available. These are R0 through R12, SP (R13), LR (R14) and PC (R15). The ARM being a load-store architecture, its function call handling is implemented with a "branch and link"-type instruction, similar to that used on the MIPS (and PPC?), which stores the return PC to R14. The registers don't have separately addressable octets within them the way the x86 does for its register file, so we would be better off to take PPC version of the defreg and defregset macrolet as a base than the version from src/compiler/x86/vm.lisp. * 2008-Mar-09: Had some time today, so I thought I'd get a little more done. Copied the register specs from src/compiler/ppc/vm.lisp into the new ARM version, and hacked up to suit (ARM spec reg name style, not CMUCL reg name style). SBs didn't need tweaking yet, took out win32 conditional in kludge-nondeterministic-catch-block-size (we'll probably need to revisit this later anyway). SCs require a bit more damage. For now we'll assume that we're going with a conservative GC, and thus don't need a partitioned register set. We take out the :element-size junk for main registers (as that's an x86 hack) and map the character registers to the normal register set. We also lose the word-reg and byte-reg SCs. The other cleanups we need before we're done with vm.lisp for now are def-misc-reg-tns and some damage to location-print-name. (done) With the basic machine model in place, we next move to the instruction definitions for the assembler and disassembler in src/compiler/arm/insts.lisp. We will be using the 32-bit ARM instruction set rather than the 16-bit Thumb set. Step one: Kill all of the floating-point instruction definitions. Step two: Kill all (well, almost all) of the disassembler setup. Step three: Kill all of the instruction definitions -except- the "miscellaneous hackery" instructions (header words and whatnot). Step four: Kill the effective-address support functions. From here, we need to start laying in the new instruction definitions (not the disassembler definitions; I don't think we need those yet). But that's a task for next time. * 2008-Nov-02: My attention returned here a few days ago, and I finally decided to get a bit more done. Laid in support for encoding "Addressing Mode 1", which is shifter operands for data processing instructions. The other four addressing modes are for load/store instructions and will have to handle SBCL's treatment of stack frame locations. What I didn't add just yet is support for "constant" TNs, mostly because I don't know if constant TNs are useful with this addressing mode. What I have is a function for use by instruction emitters to encode a shifter operand structure, TN or immediate integer and five functions for creating a shifter operand structure given certain arguments. These are straightforward prefix versions of the same functions in "standard" ARM assembly. * 2008-Nov-03: Many ARM instructions have an optional "condition" field to only allow execution under certain circumstances, similar to the conditional CALL and RET instructions on the z80. There are 15 condition codes, including an "Always (unconditional)" code. The data processing instructions, my current focus, also include an "S" bit indicating that the condition codes should be set based on the result of the operation. There are three groups of data-processing instructions: the unary set, which only takes shifter and destination operands; the comparison set, which only take shifter and source operands and always have the S bit set; and the rest, which take shifter, destination and source operands. The convention in "standard" ARM assembler is to concatenate the opcode name (such as "ADD"), the condition name (such as "LT"), and the indicator for the S bit ("S") to form an instruction name such as "ADDLTS". The quick-minded will have already realized that this means (* 15 2) variants for each instruction. This is an optimistic estimate, as there are three condition codes with duplicate mnemonics (including eliding the mnemonic for the unconditional ("AL") condition). Additionally, there are 16 data processing instructions, giving (* 16 18 2) => 576 mnemonics. So, what's the alternative? The naive approach would give us (inst addlts r0-tn r1-tn (lsl r2-tn 3)) or similar. If we make the condition field an optional first argument, that would give us (inst adds :lt r0-tn r1-tn (lsl r2-tn 3)) and bring the number of mnemonics down to (* 16 2) => 32 mnemonics for the data processing instructions. The downside to this is that it can't be implemnted via an &optional argument in the instruction definition lambda list, it will instead require some more involved parsing. On the implementation side, we have a couple of support functions for things like encoding condition fields (originally from the x86 backend, with the condition names fixed up), a nice trick by way of DEFINE-BITFIELD-EMITTER and a macro to define the actual instructions. This may have to be restructured later on for the disassembler, but that's not exactly a worry right now. As a minor matter, I suspect that ENCODE-SHIFTER-OPERAND might be better off returning two values, one for the shifter_operand field in the instruction, and one for the "I" field indicating an immediate-format shifter_operand instead of one in register-format. One concern I still have is the inability to test any of this. A comment from splittist on IRC suggests to me that writing up what I know about DEFINE-BITFIELD-EMITTER might be a good idea, either on its own merits or as part of a more comprehensive documentation on how to write a new backend for the assembler. * 2008-Nov-04: Managed to set my NSLU2 back up again today, and tried a couple simple tests of the behavior of the system WRT signal handling required for SBCL. These are the very basics, such as "what signal do we get for accessing unmapped memory". Unfortunately, the BKPT instruction doesn't appear to do anything useful, so I need to figure out why not, why GDB breakpoints work, and what alternative is available for use by SBCL for its error and pseudo-atomic traps. The test program I used for the behavior of the BKPT instruction is: int main(void) { asm("bkpt #0"); return 0; } It either enters an infinite loop, or it stops and waits for something to occur. In GDB, if a breakpoint is placed at the entry to main(), GDB will indicate that its breakpoint triggered. If stepi is then issued to execute one instruction, the gdb prompt does not return without a SIGINT being sent, suggesting that stepi is implemented by temporarily placing a breakpoint at the next instruction in the program and resuming its execution, and that whatever is going wrong with the BKPT instruction interferes. The Linux 2.6.26 sources make mention of an SWI BREAK_POINT in arch/arm/kernel/traps.c. The code immediately after it calls ptrace_break(), implying that it should do what we need. Examination of include/asm-arm/unistd.h show the existance of __ARM_NR_breakpoint at __ARM_NR_BASE+1, __ARM_NR_BASE at __NR_SYSCALL_BASE+0x0f0000, and __NR_SYSCALL_BASE at 0x900000. This suggests that a single-instruction breakpoint that the system will actually pay attention to could be SWI #0x9f0001. The test program for this should be: int main(void) { asm("swi #0x9f0001"); return 0; } ... and it works Just Fine. Leaves one to wonder why BKPT doesn't also raise SIGTRAP, but whatever works. The actual breakpoint used might have to be a build-time conditional (we're likely to have to support a number of different conditionals for various different environments anyway). For future reference, the very next bit of code in arch/arm/kernel/traps.c after the breakpoint case is a cache flush operation. We might need this if our target has split I and D caches with no hardware coherency support. * 2008-Nov-05: As it stands now, we have definitions for the data-processing instructions. What we need are branch, multiply, miscellaneous arithmetic, status register, load-and-store and exception-generating instructions. We're dealing with an ARMv5 target, so we don't need things like the parallel addition instructions or many of the multiply instructions. We also don't need the "BXJ" branch state, as we have no interest in supporting the Jazelle (Java) state. Defined the exception-generating instructions. Defined BKPT because we can't assume that all target environments will have the same damage that linux does in this matter. Defined SWI because we happen to need it on linux for our breakpoints. Will have to remember to add __ARM_NR_breakpoint to the stuff grovelled from headers during make-target-1, but that can come later, I'll be happy enough with a hardcoded constant for now. While defining SWI, I noticed that there's a definate pattern to the emitter functions for instructions with condition fields. They're always (flet ((internal-emitter (condition ...) ...)) (if (keywordp (car args)) (apply #'internal-emitter args) (apply #'internal-emitter :al args))) which just cries out for a macro. WITH-CONDITION-DEFAULTED, maybe? Branch instructions and load and store instructions need to support fixups and back-patches, and possibly even choosers. Not to mention the crazy addressing modes for the load and store instructions. I plan to deal with them later; I'd rather get the other instruction groups done first. Additionally, the semaphore instructions perform memory access, using one of the modes available to load and store instructions, so I may as well put them off until I can see what exactly I'm working with there. Defined the miscellaneous arithmetic instructions (all one of them). Defined the macro WITH-CONDITION-DEFAULTED, and converted all current instruction definitions that needed it to use it. The status instructions are MRS, MSR, CPS, and SETEND. SETEND was introduced for ARMv6, so we can ignore it. CPS has no effect in user mode and was introduced for ARMv6, so we can ignore it. This leaves MRS and MSR. MSR has some complexity to it, particularly the field masks. Interestingly, the encoding for MSR fits neatly with the encoding for the data processing instructions, particularly with the encoding for the source argument, and MRS is sufficiently close as to be able to use the same bitfield emitter. A project for another day. There are some eleven multiply instructions to support for ARMv5TE. Also a project for another day. So, what we have left to do for instruction definitions is: 1.) Branch instructions. 2.) Load/store instructions. 3.) Semaphore instructions. 4.) Load/store Multiple instructions. 5.) Multiply instructions. 6.) Status register instructions. And I think that's enough for now. * 2008-Nov-06: The branch instructions are B, BL, BX, two forms of BLX and BXJ. They break down as follows: * B and BL take a 24-bit signed displacement giving a number of words relative not to the B or BL instruction, and not to the instruction after, but to the instruction after that. Both B and BL have a condition field. * BX takes a register argument containing a 32-bit absolute address. BX has a condition field. * The first form of BLX takes a 24-bit displacement and a single "halfword" address bit. This form of BLX does not have a condition field. This form of BLX always enters "thumb" state, and, as such, we are unlikely to need it. * The second form of BLX takes a register argument containing a 32-bit absolute address with the low bit indicating a transition to "thumb" state when set. This form of BLX has a condition field. * BXJ is for "Jazelle" state, or hardware JVM support. As such, we do not need it. Note that, as the program counter is one of the addressable GPRs, it is possible to load the program counter with a data-processing instruction such as MOV. We might be able to do some "cute trick" involving loading the PC with an immediate, and having a page or two of memory in a fixed location with useful routines at regular distances. If the page were at #x1000, for example, we could have entry points ever 256 bytes. Unfortunately, the higher up in memory we need to place our routines, the less useful this trick is, due to the limitations of the immediate-constant encoding for data-processing instructions. Also note that the 24-bit signed displacement for B and BL instructions gives an effective range of approximately 32 megs either side of the branch instruction (give or take a word or two). As dynamic space tends to be larger than this, we cannot use B or BL for references outside the current code segment. Therefore, B and BL need only take labels as destinations. As we are not currently planning on using "thumb" instructions, we need only support the register form of BLX. Defined B, BL, BX and BLX instructions. I'm going to have to look over what SBCL provides for absolute fixups for other platforms, as not having fixups on branches or full 32-bit immediates feels wierd to me. * 2008-Nov-07: Last night I figured out what was going on with the BKPT instruction, and it was an infinite loop after all. The BKPT instruction, on platforms without debug hardware, causes a prefetch abort. The Linux kernel treats this as a page-fault, ensures that the required page is mapped, and restarts the instruction. Which, being the BKPT instruction, causes a prefetch abort... Today I looked into fixups. It turns out that each target platform SBCL supports does its own thing with fixups. There is some common processing paths, but in the end each backend defines its own fixup kinds, and the actual processing of each fixup is done at runtime in FIXUP-CODE-OBJECT in src/code/FOO-vm.lisp and at genesis time in DO-COLD-FIXUP in src/compiler/generic/genesis.lisp. Judging by the genesis support for alpha fixups, the alpha has a fixed instruction word width, probably 32 bits wide, and can only load 16 bits of a value at a time. This makes for four fixups per foreign address, which is effectively the situation we have on ARM (32 bit addresses, but can only load 8 bits at a time). Further investigation leads us to the use of DEFINE-INSTRUCTION-MACRO in the alpha instruction definitions, along with the magic involved in expanding out the 'LD' instruction on said platform. But that's a topic for another time. As a practical upshot, however, as the load/store instruction encoding allows for a 12-bit offset and always requires a base register, they do not interact well with fixups. This means two things: First, that fixups need not be taken into account when devising support for the load/store instructions; and second, that the MOV data-processing instruction may need revisiting to support loading a fixup address, although there may be a less code-intensive method of doing so, given that we have relatively simple PC-relative data addressing for load/store instructions (a consequence of the PC being part of the addressable register set). Maybe we can have a label be a valid thing to use as an address for a load/store instruction and have VOPs that need to use fixups store them in the elsewhere segment via an (inst dword (make-fixup ...))? * 2008-Nov-08: There are nine variants for the load/store addressing mode. Upon investigation, this turned out to be from a cross-product effect between three address calculations and three application methods (offset, pre-indexed and post-indexed, the latter two of which update the base register with the result of the address calculation) similar to the cross-product effect on full-call VOPs (see the commentary above the macrolet define-full-call in src/compiler/x86/call.lisp for details about that, or just wait, I'm sure I'll mention it again when the time comes to deal with calling conventions). So, we have three address calculation options, all applied to a base register. These turn out to be an unsigned 12-bit immediate offset or a scaled register offset (with the same shortcut for a scale of zero as in the data-processing instructions). We can thus use the same support functions as the data-processing shifter-operand modes use. This leaves us with specifying the base register, addition or subtraction, and offset/pre-index/post-index addressing. Maybe a macro such as (defmacro @ (base &optional (direction '+) (offset 0) (mode :offset)) (declare (type (member + -) direction) (declare (type (or integer shifter-operand) offset)) (declare (type (member :offset :pre-index :post-index) mode)) ...)) might do the trick. This means that a straight memory reference would be something like (inst ldr r0-tn (@ r1-tn)) and a more complex reference such as might be used in an array reference would be (inst ldr r0-tn (@ r1-tn + (lsl r2-tn 4))) and I'm sure we'll find a use for (inst ldr r0-tn (@ r1-tn + 4 :post-index)), probably in one of the multiple-value handling VOPs surrounding call/return. The obvious objection to this macro is that the only thing it does that a function couldn't is allow for unquoted + or - for the direction. I could go either way at this point. Or perhaps eliminating the direction parameter in favor of checking for and unwrapping a possible unary - expression as the offset. The only real complexities to handle here are disallowing register by register shifts, offset encoding instead of immediate encoding for integers and dealing with negative integers as offsets (because someone will want to try it, and it's neither particularly difficult to support here nor fair to force the required circumlocutions upon the call site). * 2008-Nov-09: I decided to go with the macro @, no direction parameter (unwrapping unary -), with a helper function to do the dirty work with the direction parameter. I had to introduce a memory-operand structure to hold the required information. As the load/store instructions have a regular structure, I implemented an emitter function to do the dirty work. It currently handles all cases except labels as bases (which needs doing soon) and TNs as addresses (which needs doing once I figure out what I'm doing for stack layout and access). Implemented LDR, LDRB, STR and STRB instructions in terms of the emitter function. So, of course, while I was filing off the rough edges and whatnot, I noticed an entire second group of load/store instructions with a different addressing mode. These load/store instructions are for loading sign-extended bytes and loading and storing halfwords and doublewords. The addressing mode for these instructions turns out to be a subset of those for the other load/store instructions. Essentially, 8-bit offsets instead of 12-bit offsets, and no shifter-operands, just registers or integer offsets for offsets. Still allows offset, pre-index and post-index modes. This means that we can still use the same @ macro without change to specify the address for these instructions, but we'll need a separate encoder (no great loss). A task for tomorrow, I think. * 2008-Nov-10: Filled in the case for label bases in load/store instructions. The semaphore instructions, as mentioned a few days ago, use one of the addressing modes defined for the load/store instructions. Specifically, they use a base register only. This can easily be implemented via the current memory-operand setup with a limitation of an offset of 0 and mode of :offset. These instructions are marked as depreciated in ARMv6, but as I have an ARMv5TE, that's not really an issue. Implemented SWP and SWPB instructions. Did a little more thinking about the status register instructions, particularly MSR. MSR has a bitfield specifying which of four fields in the status register will be updated. The method used to specify the contents of this bitfield in standard ARM assembler is CPSR_ or SPSR_, where is one or more of c, x, s and f. Accessing the SPSR from user (or system) mode is "UNPREDICTABLE", so we just need to cover the CPSR case. What I'm thinking is a macro, CPSR, which takes an unevaluated string designator as its argument to specify which fields to update, and expands to the integer field mask. Should we decide later on to support access to the SPSR we can have an extra bit in the encoded result from an SPSR macro, as it would not increase complexity overmuch. Implemented MRS and MSR instructions. The full MSR, with both CPSR and SPSR macros for the field mask. MRS takes a keyword of :cpsr or :spsr to indicate which status register to use. At this point, what we have left for instruction definitions is: 1.) The "miscellaneous" load/store instructions (sign-extend load, halfword and double word load/store). 2.) The load/store multiple instructions. 3.) The multiply instructions. * 2008-Nov-11: There are eleven multiply instructions that we need to support for ARMv5TE. Of these eleven, three take three registers as arguments, and the remaining eight take four registers as arguments. The manual breaks down multiply instructions into six groups, four of which we care about. Of the four groups we care about, two support "S" versions of their instructions that set the N and Z flags according to the result. Each group contains both multiply and multiply-accumulate instructions. 1.) "Normal multiply". These have "S" versions. The instructions are MUL and MLA. 2.) "Long multiply". These have "S" versions. The instructions are SMULL, UMULL, SMLAL and UMLAL. 3.) "Halfword multiply". These do not have "S" versions. The instructions are SMULxy, SMLAxy and SMLALxy. 4.) "Word x halfword multiply". These do not have "S" versions. The instructions are SMULWy and SMLAWy. An x or y in an instruction mnemonic here indicates that the real instruction has a T or B in that place in the actual mnemonics. Any instruction that is a multiply-accumulate (MLA) instruction takes four registers as arguments. Any instruction with a 64-bit (long) result (MULL or MLAL) also takes four registers as arguments. The other three instructions (MUL, SMULxy and SMULWy) take three registers as arguments. For our purposes, it might make sense to use three groups of multiply instructions. Normal and long, by-halfword, and SMLALxy. The multiply instructions have a regular encoding. 4-bit cond field, 8-bit opcode field, three 4-bit register fields (the second of which should be zero for three-register instructions), another 4-bit opcode field, and a final 4-bit register field. Checking to find the pattern to the register order: MUL Rd Rm Rs (dZsm) SMULxy Rd Rm Rs (dZsm) SMULWy Rd Rm Rs (dZsm) MLA Rd Rm Rs Rn (dnsm) SMLAxy Rd Rm Rs Rn (dnsm) SMLAWy Rd Rm Rs Rn (dnsm) SMULL Rd RD Rm Rs (Ddsm) UMULL Rd RD Rm Rs (Ddsm) SMLAL Rd RD Rm Rs (Ddsm) UMLAL Rd RD Rm Rs (Ddsm) SMLALxy Rd RD Rm Rs (Ddsm) What we see here is that the source and multiplicand are always in the same registers. From there, we break things out into "long" instructions, which have one pattern for their result registers, and other instructions, which have a given place for their result register and another for the register containing the addend for MLA instructions or an SBZ field for MUL instructions. I added the opcode fields to the chart above so that I could look for the pattern to the opcode encoding: MUL Rd Rm Rs (dZsm) 0000000S 1001 MLA Rd Rm Rs Rn (dnsm) 0000001S 1001 UMULL Rd RD Rm Rs (Ddsm) 0000100S 1001 UMLAL Rd RD Rm Rs (Ddsm) 0000101S 1001 SMULL Rd RD Rm Rs (Ddsm) 0000110S 1001 SMLAL Rd RD Rm Rs (Ddsm) 0000111S 1001 SMLAxy Rd Rm Rs Rn (dnsm) 00010000 1yx0 SMLALxy Rd RD Rm Rs (Ddsm) 00010100 1yx0 SMULxy Rd Rm Rs (dZsm) 00010110 1yx0 SMLAWy Rd Rm Rs Rn (dnsm) 00010010 1y00 SMULWy Rd Rm Rs (dZsm) 00010010 1y10 I'm not sure where this gets me. Actually, an hour or more later, I am sure where this gets me. It gets me a starting point. There are three register patterns and corresponding register field mappings (dZsm, dnsm and Ddsm). A macrolet which takes a name, a symbol for the field mapping, and the two opcode field contents is the bare-minimum interface, and clearly sufficient to encode everything without additional trickery. Once that much is implemented, along with the six non-halfword instructions and their "S" variants, we can figure out if we want to be clever with the remaining instructions. Defined bitfield emitter for multiply instructions. Implemented non-halfword multiply instructions. Decided not to try anything cute with symbol-name hacking to automatically fill out the xy suffixes for the remaining instructions, just to use the existing macrolet and spend the sixteen lines of source required. Implemented remaining multiply instructions. * 2008-Nov-12: Skimming over src/compiler/*/insts.lisp does not lead me to believe that any existing backend supports multiple-register load/store instructions that take a bitfield of registers to operate on. As such, I'm not convinced at this point that we need to or can easily support the LDM and STM instructions. Should they turn out to be needed I am perfectly willing to revisit this decision later. This leaves the halfword and sign-extend load and store instructions. For LDRD and STRD, Rd is required to have an even address. This will likely require a special storage-class in order to explain things properly to the register allocation logic. As it is less critical for assembly-routines and similar code fragments, we shall simply AVER that the TN-OFFSET is even. Interestingly, the closest-matched instruction format that we already have defined is emit-multiply-instruction. The field (byte 4 4) is specific to the instruction, not the addressing mode, the field (byte 3 25) is zeros, the field (byte 4 21) is specific to the addressing mode, and the field (byte 1 20) is the last bit of the instruction-specific bits and can easily be brought in with a logior if necessary. Implemented encoder for miscellaneous load/store instructions. Defined miscellaneous load/store instructions. Moved all load/store instructions to the end of insts.lisp so that they could be together in one place and still use the multiply instruction emitter. With this, all of the instructions that I'm planning on implementing have been implemented. So, how can I test all this? What needs doing next? Well, the most basic test is "does it compile". And the answer is "of course not". Build 1: Turns out that tools-for-build/ldso-stubs.lisp needs to be customized for each new CPU and possibly for each platform toolchain. Fortunately, investigation shows that this step can safely be skipped until such time as we get to make-target-1. Build 2: In src/compiler/arm/vm.lisp, there's a def!constant cfp-offset that hasn't been adjusted from the x86 value of ebp-offset. Fixed. Build 3: In src/compiler/arm/vm.lisp, we copied most of the file from the x86 backend, but the register specs were copied from the ppc backend. The ppc port defined register-arg-names, but the x86 port defined *register-arg-names* and used them later in the file. Changed to the x86 convention. Build 4: The catch-block-size kludge failed its aver. This is because the x86oid ports, unlike all other ports, use an unboxed PC instead of a boxed code-object pointer and an offset within the code-object to store the restart address for a catch-block. Changed the kludge size. Build 5: In src/compiler/arm/insts.lisp, there is a reader-error after the definition of emit-load/store-instruction, "comma not inside a backquote". Added a missing backquote in macrolet define-load/store-instruction. Build 6: In src/compiler/arm/insts.lisp, there is a erader-error after the definition of emit-misc-load/store-instruction, "comma not inside a backquote". Added a missing backquote in macrolet define-misc-load/store-instruction. Build 7: Continuing on with src/compiler/arm/insts.lisp, 36 WARNINGs, 8 STYLE-WARNINGs, and 6 notes. On the upside, we have a RECOMPILE restart, so we can iterate until we get something without warnings. That's enough for today, I think. Tomorrow, I'll concentrate on making insts.lisp compile cleanly. * 2008-Nov-13: In encode-status-register-fields, ASSOC returns a CONS, not its CDR. In define-instruction MSR, forgot to put the field-mask in the call to emit-dp-instruction. In define-data-processing-instruction (why isn't this a macrolet?), neglected to place appropriate commas and quotes around some parameterization code. In emit-semaphore-instruction, accidentally used define instead of defun. In emit-load/store-instruction and emit-misc-load/store-instruction, used DEST instead of BASE when obtaining the label offset within the code segment (copy/paste error). In emit-misc-load/store-instruction, an undefined-variable error led me to notice that I had forgotten a parameter to flet compute-opcode to indicate register or immediate offset. In emit-load/store-instruction and emit-misc-load/store-instruction, forgot to quote the second argument to typep. In encode-status-register-fields, used string-length instead of length. Commented-out the fixup emitters, as they are from the x86 backend and at least one of them is obviously wrong. Commented out the WORD instruction definition and renamed the DWORD definition to WORD. If we turn out to want a literal halfword-emitter, we can add it back in later. Build 8: In src/compiler/arm/insts.lisp, still calling emit-dword in two places. Fixed. In src/compiler/arm/macros.lisp, there is a function allocation-dynamic-extent, which refers to eax-offset. This file is entirely x86-specific helpers and assembly fragments, so is out of scope for today. At this point I need to start thinking about more involved design issues such as calling conventions and pseudo-atomic. I'm thinking to take a short break first to work on other things. A patch containing my progress to date, based on SBCL 1.0.13, is available at . It is mostly copies of the x86 backend at this point. * 2009-May-19: It's been a while, and I've started thinking about the ARM port again. It has recently occurred to me to try a different approach to the problem, and that my instruction definitions may be somewhat wrong. Additionally, a lot has changed since sbcl-1.0.13, and keeping up to date will be easier if I keep everything in a git branch. Using a git branch means I can also use a public git repository, which is something I am experimenting with. As of this writing, the current version of SBCL is 1.0.28.60. In 1.0.28.15, the build system was refactored, making it easier to set up the partial environment used for porting. Also, with a public repository, I may as well set the environment up "right" from the start. * Added ARM as a target arch to make-config.sh, but not (yet) for auto-detection. "SBCL_ARCH=arm ./make.sh" now starts to do the right thing. Values for local-target-features chosen at semi-random. * Added ARM to the mapping for /target/ path components in src/cold/shared.lisp. * Added a dummy stub for ARM to tools-for-build/ldso-stubify.lisp. Pushed these changes to repo.or.cz. http://repo.or.cz/w/sbcl/nyef.git is the gitweb and the mirror URL is git://repo.or.cz/sbcl/nyef.git (for git-clone). * src/compiler/arm/parms.lisp in my old tree appears to be an unedited copy of the corresponding x86 version. Changes in the meantime have rendered it uncompilable, so I copied the current x86 version instead. * src/compiler/arm/backend-parms.lisp appears to be an unedited copy of the corrresponding x86 version. This appears to be slightly wrong, as there's a +backend-fasl-file-implementation+ value of :x86. Copied the current x86 version, changing the backend fasl file implementation value to :arm. * Copied src/compiler/arm/vm.lisp from the old tree. The only critical change since it was created appears to have been a change to kludge-nondeterministic-catch-block-size, as an unused catch-block slot has been removed. * Copied src/compiler/arm/insts.lisp from the old tree. * Created a dummy src/compiler/arm/macros.lisp file to make the build happy. * Added src/assembly/arm/support.lisp as a copy of the x86 version. This file contains a few short assembly fragments to implement part of the calling convention, which were not converted. * Added dummy files for the remainder of the files belonging in src/assembly/arm/. * Added dummy files for the remainder of the files belonging in src/compiler/arm/. Now that all of the backend files mentioned in the current build order exist (src/code/arm-vm isn't yet mentioned, and we can worry about it later), we can get down to making it actually build. The first thing we run into is that src/compiler/generic/late-nlx.lisp uses the definition of a couple VOPs from src/compiler/target/nlx.lisp at compile-time (save-dynamic-state and restore-dynamic-state). Copying the two VOPs in question from the x86 backend and conditioning out the generators causes errors because there are no "move functions" defined to move to/from alternate or constant SCs. What I'm trying to do at this point is get through make-host-1 and first genesis. Once that happens, I'm planning on globally forcing :trace-file to be enabled for all stems in the host-2 build (from-xc) and adding things piecemeal for a while. On the other hand, I may just get the cross-compiler built and then start running small test files through it to debug the code generators a single function at a time. To be decided later. Backing up a paragraph, move functions. A look at src/compiler/x86/move.lisp shows use of define-move-fun, define-move-vop, and a single primitive-type-vop. The primitive-type-vop can be ignored for now (no doubt we'll touch upon that entire system later). The other two macros are defined in src/compiler/meta-vmdef.lisp, and have some obtuse commentary and implementation. There is a mention that all uses of define-move-fun should occur before any VOP definitions. This doesn't actually happen on any backend: build-order has target/move before target/float, target/move has some define-vops, target/float has some define-move-funs. Anyway, define-move-fun takes a name, a cost, a 3-arg lambda list, a list of lists of scs, and a body. The lambda list is traditionally always (vop x y), with x and y being the source and destination of the move operation... or was it the other way around? A good reason to break with tradition, to my mind. The body is the code to generate, and is wrapped in an ASSEMBLE form by the macro. This leaves the list of lists of scs, which was actually the hard part of figuring out what the macro does. There must be an even number of lists of scs. For each pair of these lists (in order, by #'cddr) the first is a list of scs to move from and the second is a list of scs to move to. Within each pair of lists, a cross-product is set up from each sc in the from list to each sc in the to list and the move function is set up as the appropriate move function from the one sc to the other with the specified cost. And this explanation is likely as impenetrable as the code was. A quick grep of src/compiler/*/move.lisp shows seven instances of define-move-fun for the x86oid ports, twelve for the alpha port, and nine for the other ports. +--------------------------+------+-----+-------+-------+ | move function name | cost | x86 | alpha | other | +--------------------------+------+-----+-------+-------+ | load-immediate | 1 | X | X | X | | load-number | 1 | X | X | X | | load-character | 1 | X | X | X | | load-system-area-pointer | 1 | X | X | X | | load-constant | 5 | X | X | X | | load-stack | 5 | X | X | X | | load-number-stack | 5 | | X | X | | load-number-stack-64 | 5 | | X | | | store-stack | 5 | X | X | X | | store-number-stack | 5 | | X | X | | store-number-stack-64 | 5 | | X | | +--------------------------+------+-----+-------+-------+ There are two obvious divisions in this breakdown. First is those ports without a separate number (non-descriptor) stack. These are the x86oid ports, which use gencgc, which conservatively scavenges the control stack. Second is the alpha, with it's funky 32/64-bit hybrid VM model. Looking at src/compiler/arm/vm.lisp, I see that I went with the x86oid single-stack model, which means I should probably start from the x86 move functions, mostly to get the initial lists of SCs. A simple contextual analysis of the code for the move vops shows that the third argument (traditionally y) is the destination, so I'm going with a lambda list of (vop src dest), which is a little more obvious than (src x y). Rather than deal with any platform-specific details for instruction sequences for the move functions, I'm just going to condition out their bodies for now and plan on filling them back in later. * Copied initial set of move-funs into src/compiler/arm/move.lisp from x86 version, disabling the code generation bodies and renaming the function parameters. * Copied save-dynamic-state and restore-dynamic-state VOP definitions into src/compiler/arm/nlx.lisp from x86 version, disabling the generators. Next problem, build fails on src/compiler/generic/array.lisp, undefined variable nil-array-accessed-error, undefined function error-call. Error-call is a macro defined in src/compiler/target/macros.lisp, so copy from the x86 version and... Wait, what? It's a function on x86oids? Anyway, take the x86 version, comment out the bit that actually generates code (notice a pattern here?)... * Copied error-code handling functions to src/compiler/arm/macros from the x86 version, disabling the code generation. So, of course, this still doesn't work because it's the wrong thing. It turns out that the x86oid error-call macro being a function causes gratuitous x86oid conditionals in src/compiler/generic/array.lisp, and using the x86 version means we're not covered under that exemption. The right thing to do is to convert the x86oid version of error-call to be a macro again, but that'll have to wait as it's an interface change. Meanwhile, back out the previous step and use a real version. * Copied error-code handling functions to src/compiler/arm/macros from the ppc version, disabling the code generation. Next up, src/compiler/generic/late-type-vops.lisp fails to build. 22 errors, 203 warnings and 5 style-warnings. It turns out that every top-level form in the file is either an in-package, a !define-type-vops, or a macrolet that expands to a !define-type-vops. The macro in question is defined in src/compiler/target/type-vops.lisp. A sufficiently involved problem that I'm inclined to leave it for another day. * 2009-May-20: Started looking at src/compiler/target/type-vops. Taking a look at all of the definitions in each backend, what they do, and where else they are mentioned. The first part is those functions used in the macroexpansion of test-type in src/compiler/generic/early-type-vops. These are %test-fixnum, %test-fixnum-and-headers, %test-immediate, %test-lowtag, and %test-headers. On 64-bit ports, these also include %test-fixnum-and-immediate, %test-immediate-and-headers, and %test-fixnum-immediate-and-headers. test-type itself is used in compiler/target/type-vops, compiler/ppc/subprim, compiler/ppc/values, compiler/sparc/subprim, and compiler/sparc/values. An examination of circumstances shows that the use in values is in VOP values-list for a list end test and the uses in subprim are in VOP length/list for a list end test. There is a comment in the PPC subprim to the effect that changing the code to be more like the other ports (which have an explicit test inline rather than using test-type) might be a good idea. Though, I have to wonder if moving towards using test-type there might be a better idea? The second part is the basic type check and predicate VOPs and the macro to support the generic type-vop definition in src/compiler/generic/late-type-vops. The macro !define-type-vops appears to have some parameters which are meaningful only to some backends and not others. It defines predicate VOPs and check VOPs. Predicate VOPs are defined to :translate their corresponding predicate function. If a primitive type is specified, check VOPs are set as the :check type vop for the primitive type. The third part is predicates and checks for integer value ranges. Not much to say about that. The fourth part is predicates and checks for cons and symbol types. This is required because NIL is both list and symbol, and its lowtag is set up such that it doesn't work with test-type for symbol-widetag and there is no separate cons lowtag so it needs to be filtered out of the list-lowtag test for the cons type. Unfortunately, most of this is code generation. Fortunately, the only part that needs even so much as a dummy implementation in order to proceed is !define-type-vops, and it probably doesn't need to do its full job yet. * Copied !define-type-vops and the supporting VOPs and functions into src/compiler/arm/type-vops.lisp, removing the simple- variant machinery, elaborating on the KLUDGE comment, and disabling the VOP temporary registers until such time as I know what code to generate. Next up, src/compiler/aliencomp failed to build because VOPs call-out, alloc-number-stack-space and dealloc-number-stack-space don't exist. I happen to know from other investigation that the alien machinery is largely separate from the rest of the compiler, and my current goals (getting to make-host-2 so that I can get lisp code generation working) don't require it, so... * Removed src/compiler/aliencomp from the build order as a temporary measure. Next up, src/compiler/ir2tran failed to build. 74 ERROR conditions, the 33 of which that I can see in my scrollback are all from undefined VOPs referenced by name. This is the heart of the implementation of the calling convention, non-local-exits and binding/unbinding. And possibly a few other things as well. So, the first thing to do is to obtain a proper build log. M-x shell comes in handy here. Next is to obtain from the build log a full list of missing VOPs required for ir2tran and a list of any errors not caused by missing VOPs. MOVE MAKE-VALUE-CELL VALUE-CELL-REF FAST-SYMBOL-VALUE SYMBOL-VALUE FAST-SYMBOL-GLOBAL-VALUE SYMBOL-GLOBAL-VALUE FDEFN-FUN SAFE-FDEFN-FUN CURRENT-STACK-POINTER MAKE-CLOSURE CLOSURE-INIT VALUE-CELL-SET SET %SET-SYMBOL-GLOBAL-VALUE PUSH-VALUES BRANCH BRANCH-IF SB!VM::STEP-INSTRUMENT-BEFORE-VOP CURRENT-FP ALLOCATE-FRAME KNOWN-CALL-LOCAL MULTIPLE-CALL-LOCAL CALL-LOCAL TAIL-CALL-NAMED TAIL-CALL ALLOCATE-FULL-CALL-FRAME CALL MULTIPLE-CALL-NAMED MULTIPLE-CALL XEP-ALLOCATE-FRAME COPY-MORE-ARG SETUP-CLOSURE-ENVIRONMENT CLOSURE-REF SETUP-ENVIRONMENT KNOWN-RETURN RETURN-SINGLE RETURN RETURN-MULTIPLE TAIL-CALL-VARIABLE MULTIPLE-CALL-VARIABLE CALL-VARIABLE RESET-STACK-POINTER %%NIP-VALUES VALUES-LIST %MORE-ARG-VALUES BIND UNBIND UNWIND THROW CURRENT-BINDING-POINTER MAKE-CATCH-BLOCK MAKE-UNWIND-BLOCK SET-UNWIND-PROTECT NLX-ENTRY-MULTIPLE NLX-ENTRY UWP-ENTRY UNBIND-TO-HERE LIST LIST* NIL-FUN-RETURNED-ERROR And no errors from any other cause. These VOPs can be broken down into groups based on functionality, file in which they are defined, etc., but for right now we can pick off a few easy groups. 1.) Calling convention. These include anything with CALL, RETURN or ALLOCATE-FRAME in their name along with VALUES-LIST and the complex arglist helpers with MORE-ARG in their name. And CURRENT-FP. 2.) Non-local exits. These are NLX-ENTRY, NLX-ENTRY-MULTIPLE, UWP-ENTRY, THROW, MAKE-CATCH-BLOCK and MAKE-UNWIND-BLOCK. These also include SAVE-DYNAMIC-STATE and RESTORE-DYNAMIC-STATE from yesterday, which gives a bit of a suggestion as to how we can go about dummying things up to get further along in the build. 3.) Closure environments. This obviously includes the four VOPs that have CLOSURE in their name. It less obviously (took me a few minutes of digging to find out) includes the three VOPs that have VALUE-CELL in their name. 4.) Dynamic binding. These are BIND, UNBIND, CURRENT-BINDING-POINTER and UNBIND-TO-HERE. 5.) Flow control. These are BRANCH and BRANCH-IF. 6.) Various accessors. Notably anything involving SYMBOL-VALUE, SYMBOL-GLOBAL-VALUE and FDEFN-FUN. 7.) MOVE. 'nuff said. 8.) Other odd stuff. This would include stack manipulation, list creation, debugging hooks, etc. I suspect that I'm going to copy in many of these VOPs from the x86 or ppc backend, disabling the code generators in the process, and then back and fill later. At the same time, at least some of these items are an opportunity to write some actual code generation for the ARM backend, even without having nailed down the calling conventions. But that's a task for another day. * 2009-May-21: The simplest group of VOPs to implement from the list obtained yesterday may well be the accessors. Looking at the accessor VOPs in compiler/x86/cell shows that they inherit from cell-ref and cell-set VOPs, which turn out to be in compiler/target/memory. The simplest versions of this file appear to be the alpha, hppa and mips versions, as they have just the four VOPs cell-ref, cell-set, slot-ref and slot-set. The sparc and ppc versions have a bunch of indexed access VOPs and the x86oid versions have cell-setf, conditional-set and xadd VOPs. The alpha versions of these VOPs have simple generators, being an invocation of LOADW or STOREW, which are "instruction-like" macros defined in compiler/alpha/macros, and are fairly typical for most/all ports. It looks like (LOADW [ []]) should convert to (inst ldr (@ (- (ash word-shift) ))), and similar for STOREW. So, to start with, copy the alpha versions of LOADW and STOREW into compiler/arm/macros, converting them to generate the appropriate ARM code. Then copy the alpha memory VOPs to compiler/arm/memory. Then do a build check, as there shouldn't be any net change from this. * Implemented LOADW and STOREW macros based on Alpha versions of the same. * Copied alpha version of memory access VOPs to compiler/arm/memory, removing references to the null and zero SCs (alpha stores these two values in registers). Build status unchanged, which is as expected. * 2009-Jun-17: * Copied ppc version of slot access VOPs to compiler/arm/cell verbatim, as they make no references to machine-specific information. Build status unchanged, which is as expected. * Copied ppc version of symbol hacking VOPs to compiler/arm/cell, fixing up ASM fragments for the three VOPs with actual INSTs. Down to 68 error conditions from 74. * Copied ppc versions of closure indexing and value-cell hacking to compiler/arm/cell verbatim, for the same reason as the slot-access VOPs. Somehow we now need the MOVE macro. * Copied the MOVE macro definition to compiler/arm/macros from ppc version, changing the instruction mnemonic as needed. Which lead to needing indexed memory reference VOPs... which are ppc and sparc specific? I seem to have made a wrong turn somewhere. Some investigation shows that all ports have either indexed memory reference VOPs or use define-full-{sett,reff}er. * Disabled the CLOSURE-INDEX-REF, FUNCALLABLE-INSTANCE-INFO and SET-FUNCALLABLE-INSTANCE-INFO VOPs. Back to ir2tran and down to 61 error conditions. * Copied the alpha version of predicate VOPs to compiler/arm/pred, fixing up ASM fragments for VOPs with INSTs. The alpha version has no cmov support, which is something we'll want later, I'm sure, as we should easily be able to support it on this architecture. The alpha version also has no BRANCH-IF emitter (actually, its emitter raises an error). Supposedly this is for test operations that set flags, which is likely to need revisiting sooner rather than later. Down to 57 varie^W error conditions. More tomorrow, possibly. * 2009-Jun-18: The remaining functional groups are calling convention, non-local exits, dynamic binding, allocation, and "other odd stuff". Of these, "other odd stuff" is a bit of a catch-all; allocation, calling convention, and non-local exits all require some deep thought about how things work at a low level; and dynamic binding requires a decision about where the binding stack pointer lives. Our choices for the binding stack pointer are the x86oid static-symbol value slot or the non-x86oid register approaches. Or I could make it a build option and do both. One point in favor of the static-symbol approach is sb-alien call-out and callbacks not needing to save and restore the bsp register. One point in favor of the register approach is the indirect addressing mode constraints. That said, the same lossage happens for all static symbol access, even just computing the address, so we'll lose for anything we need to compare to NIL. So, at the very least we need to burn a register for NIL. This makes static-symbol address calculations easy enough, so we can go with that approach for the binding stack. The alpha is another platform that burns a register for NIL. There, the register is one of the "callee saved" registers in the calling convention. The ARM Procedure Call Standard designates r4-r11, with the possible exception of r9, as callee-saves registers. As the compiler doesn't have any frame-pointer-elision optimizations, we need to burn a register for that. The obvious two registers to burn at this point (barring any instruction encoding considerations) are r10 and r11. Making null a constant SC for descriptor-reg apparently requires a move function. * Redefine r10 and r11 as null and fp (frame-pointer) registers. * Add a storage class for null similar to the alpha backend. * Add the null SC to the (dummied out) move functions as needed. * Tweaked the definition of the assembler @ macro to only do clever things with unary - instead of ignoring all arguments after the first. * Added binding and unbinding VOPs to compiler/arm/cell based on the unithreaded x86 versions, with generators rewritten as needed. Moving right along, allocation is based on the pseudo-atomic mechanism. Pseudo-atomic works over two bits of data, known as the pseudo-atomic and pseudo-atomic-interrupted flags. These bits have been encoded in various ways at different times in the different backends. Essentially, when a thread is about to do something that shouldn't be interrupted it sets pseudo-atomic. If the runtime catches a deferrable signal while in pseduo-atomic then it sets the pseudo-atomic-interrupted flag, saves the interrupting context, defers all deferrable signals, and resumes execution. Once the thread no longer needs to be uninterruptable it clears the pseudo-atomic flag and then checks the pseudo-atomic-interrupted flag to see if it should cause a pending-interrupt trap to allow the runtime to run the handler for the deferred signal and re-enable the deferrable signals. Or something like that. The ARM has semantics for unaligned memory access, so we can't use the low bits of our frame or stack pointers for p-a or p-a-i. This means that we need to burn a register for them on a temporary or permanent basis. I don't want to burn a register for the allocation pointer on anything other than a temporary basis, either, so we'll go with a temp register for now. Our ideal scheme for testing pseudo-atomic-interrupted is if it is in our temp register already at the end of the pseudo-atomic block. If we store the tn-offset (register number) of the register as our *pseudo-atomic* value then the runtime can find the register and set its value. We need to set the register up to hold its tn-offset so as to populate *pseudo-atomic*. We also need a value for not being in pseudo-atomic, but we hold NIL in a register so that's easily done. When leaving pseudo-atomic we can compare the temp register to its intended value (its tn-offset) and if it's not the same then we need to trap. Sounds like a plan. * Implemented pseduo-atomic in compiler/arm/macros. Next up, allocation... Maybe. There are two garbage collectors in use in SBCL today. The first is the venerable cheneygc, a single-generation stop-and-copy twospace collector. The second is gencgc, a multi-generation stop-and-copy collector with support for multiple open allocation regions, concurrent allocation by multiple mutator threads, etc. Clearly we want gencgc, right? Not necessarily, especially at this point. There are a number of tradeoffs, but suffice it to say that, at this point, the interesting one to me is that gencgc allocation needs to detect overflow of the open allocation region and call to special routines that close the allocation region, possibly signal GC, and so on, and when cheneygc runs out of memory in the current dynamic space it just sets pseudo-atomic-interrupted and arranges for a GC to run once the pending-interrupt trap goes off. Cheneygc thus is simpler to implement right now, which is my main concern. It can be changed later if we get so far as to attempt to add threading support. Our next issue is the location of the allocation pointer. All non-x86oid ports have a dedicated ALLOC register for this. Threaded x86oids store it in the TLS block. Non-threaded x86oids store it in a C global and refer to it via an absolute fixup. Our search for a precedent leads us once more to the CMUCL CVS repository and the RT and OLD-RT backends, where we find that they stored the allocation pointer as a static symbol value. Problem solved, more or less. I'm tempted, at this point, to write out an allocation sequence inline a few times and then build the allocation macros on that basis. I'm also tempted to leave allocation until later. * 2010-May-24: Yesterday I looked things over again, checked in the pseudo-atomic code that I wrote nearly a year ago, and forward-ported to HEAD. Only one conflict during rebase, which was nice. Something doesn't seem to be thought through correctly with the whole register-allocation/GC/memory- model thing, but I'm not sure what at this point. A great number of things need to be able to compute addresses within the current stack frame. This is not implemented at the instruction level, nor have any decisions been made with respect to data location besides that the stack grows downwards. One concern here is that the frame pointer is loaded from the stack pointer, thus we need to know if the stack pointer points to the top element of the stack or just beyond it. At the same time, this is mere convention, and can be changed later, so we may as well just pick choose one, write down what choice was made, and move on, knowing that we can fix it later. This is even more the case if we wrap the address calculations in named functions such as the x86oid frame-byte-offset and frame-word-offset. So, it turns out that we already have such functions. Must have come in at some point when copying a bunch of stuff from one of the target/vm files. Moving right along... A quick investigation of the PPC instruction definitions shows no special magic for TNs in memory operands. Said special magic is in the MOVE functions... which we have stubbed out. Good enough for now. Copied a pile of MOVE VOP things from ppc/move, things like MOVE and MOVE-ARG and appropriate setup, along with ILLEGAL-MOVE, but not any of the move-and-convert operations. Removed the ZERO storage class from MOVE and MOVE-ARG VOPs as we don't have a ZERO register (at least, I don't remember one right now, and we can always put it back later). Down from 54 errors to 53 errors. * EOF