Notes on SBCL/Win32 threading. Alastair Bridgewater, December 2006. Windows is an unusual platform for SBCL, not just because of the memory management and signals vs. exception handling differences, but also calling conventions, API design, the enforced constraints on thread behavior, the available synchronization primitives, and the use of threads throughout the system. As such, there are a number of system-enforced limits that we must heed (either by arranging not to transgress, or by being able to recover from what the system will do when we transgress) and a number of things that we can expect that the system or alien code will do that we must be prepared to handle. Important terms: * Thread: A preemptively-scheduled SMP-capable thread. * Fiber: A cooperatively-scheduled (green) thread, which executes in the context of a Thread (these are SMP-capable if you have multiple Threads running Fibers). * Exception Handling: The OS-provided equivalent of everything from HANDLER-BIND to UNWIND-PROTECT to CATCH. Most of the things that a posix system would use a signal such as SIGILL, SIGSEGV, SIGFPE or similar are done via Exception Handling on windows. * Exit Unwind: A final walk through all of the SEH Frames for a Thread done immediately prior to process exit due to an unhandled exception. * SEH: Structured Exception Handling. See Exception Handling. * Handle: An opaque value representing a kernel object such as an open file, a thread, a process, and so on. * Guard Page: An allocated page of memory that will cause an exception the first time that it is accessed and thereafter behaves as normal memory. System constraints: * Each Thread has a system-allocated control stack. It is possible to specify, at Thread creation time, the size of this stack. * A Thread may not switch its control stack. The one it is granted at birth is the one it retains for life, barring intervention by use of the Fiber functions. * There is no sigaltstack(). All exceptions are handled on the current stack of a Thread. * If a Thread tries to overflow its stack then it will hit a Guard Page. The system will raise an exception at this point. If the Thread overflows its stack again without having restored the Guard Page then the system will kill at least the Thread if not the entire process. * Threads which call ExitThread do not get an Exit Unwind. * Before a Thread can call SwitchToFiber() it must call ConvertThreadToFiber(), but there is no way prior to Vista to know if a Thread created by alien code is already a Fiber, and the documentation is spectacularly unclear on the consequences of using the Fiber functions improperly (such as calling SwitchToFiber() without calling ConvertThreadToFiber() or calling ConvertThreadToFiber() when the Thread has already been converted). * There are no segment descriptor management functions, which means that the usual mechanism that SBCL uses for TLS on x86 can't be used on Win32 (see below for alternatives). * There is no mechanism for asynchronous delivery of some notification to a Thread that does not require the Thread to poll for such events periodically. Situations we need to allow for: * Alien code creating a new Thread which invokes a Lisp callback. * Alien code invoking a Lisp callback which calls more Alien code which tries to unwind back to the outer level of Alien code. If you think this unlikely, consider that we do this to Alien code, and turnabout is fair play. Functionality we need to provide: * Stopping a Thread for GC and obtaining its register contents. A combination of SuspendThread() and GetThreadContext() should suffice here. * Stopping a Thread in order to execute an asynchronous function call (INTERRUPT-THREAD). The same approach as stopping for GC plus SetThreadContext() and building a fake stack frame should suffice here. Opportunities for excellence: * Thread control stacks have a lazy-commit feature for memory allocation. We should be able to do the same for the alien and binding stacks. * While we need to maintain a thread structure and TLS block for Alien-allocated threads (to maintain global-variable state), we should be able to pool their alien and binding stacks if they are allocated separately from the TLS block. * If we are doing separate allocation for thread stacks, we can provide an option to set the default sizes (for when an alien-created Thread calls in) and the sizes used for a Lisp-created Thread. * If a Thread is in alien code there should be no need to stop it in order to run a GC. If we store a chain of stack regions which are used for Lisp code and have a locking mechanism so that the Thread will stop itself until GC is over if the alien function returns then we shouldn't have to interrupt the Thread preemptively. * With lazy-commit stack memory and stack guard pages an obvious time to reduce the commit footprint and reinstate blown guard pages is during GC. EOF