Thread-local storage. Each thread has a thread control block, or TCB. The TCB is a variable-size structure headed by `struct tls_tcb' from , with: (a) static thread-local storage for the TLS data of initial objects, i.e., those loaded at startup rather than those dynamically loaded by dlopen (b) a pointer to a dynamic thread vector (DTV) for the TLS data pointers of objects that use global-dynamic or local-dynamic models (typically shared libraries or dlopenable modules) (c) the pthread_t pointer The per-thread lwp private pointer, also sometimes called TP (thread pointer), managed by the _lwp_setprivate and _lwp_setprivate syscalls, either points at the TCB directly, or, on some architectures, points at tp = tcb + sizeof(struct tls_tcb) + TLS_TP_OFFSET. This bias is chosen for architectures where signed displacements from TP enable twice the range of static TLS offsets when biased like this. Architectures with such a tp/tcb offset must provide void *__lwp_gettcb_fast(void); in machine/mcontext.h and must define __HAVE___LWP_GETTCB_FAST in machine/types.h to reflect this; otherwise they must provide __lwp_getprivate_fast to return the TCB pointer. Each architecture has one of two TLS variants, variant I or variant II. Variant I places the static thread-local storage _after_ the fixed content of the TCB, at increasing addresses (increasing addresses grow down in diagram): +---------------+ | dtv pointer | tcb points here (struct tls_tcb) +---------------+ | pthread_t | +---------------+ | obj0 tls | obj0->tlsoffset = 0 | | | | +---------------+ | obj1 tls | obj1->tlsoffset = 3 +---------------+ | obj2 tls | obj2->tlsoffset = 4 | | . . . . . . | | +---------------+ | objN tls | objN->tlsoffset = k +---------------+ Variant II places the static thread-local storage _before_ the fixed content of the TCB, at decreasing addresses: +---------------+ | objN tls | objN->tlsoffset = k +---------------+ | obj(N-1) tls | obj(N-1)->tlsoffset = k - 1 . . . . . . | | +---------------+ | obj2 tls | obj2->tlsoffset = 4 +---------------+ | obj1 tls | obj1->tlsoffset = 3 +---------------+ | obj0 tls | obj0->tlsoffset = 0 | | | | +---------------+ | tcb pointer | tcb points here (struct tls_tcb) +---------------+ | dtv pointer | +---------------+ | pthread_t | +---------------+ See [ELFTLS] Sec. 3 `Run-Time Handling of TLS', Figs 1 and 2, for bigger pictures including the DTV and dynamically allocated TLS blocks. Each architecture also has its own ELF ABI processor supplement with the architecture-specific relocations and TLS details. References: [ELFTLS] Ulrich Drepper, `ELF Handling For Thread-Local Storage', Version 0.21, 2023-08-22. https://akkadia.org/drepper/tls.pdf https://web.archive.org/web/20240718081934/https://akkadia.org/drepper/tls.pdf Steps for adding TLS support for a new platform: (1) Declare TLS variant in machine/types.h by defining either __HAVE_TLS_VARIANT_I or __HAVE_TLS_VARIANT_II. (2) _lwp_makecontext has to set the reserved register or kernel transfer variable in uc_mcontext according to the provided value of `private'. Note that _lwp_makecontext takes tcb, not tp, as an argument, so make sure to adjust it if needed for the tp/tcb offset. See src/lib/libc/arch/$PLATFORM/gen/_lwp.c. This is not possible on the VAX as there is no free space in ucontext_t. This requires either a special version of _lwp_create or versioning everything using ucontext_t. Debug support depends on getting the data from ucontext_t, so the second option is possibly required. (3) _lwp_setprivate(2) has to update the same register as _lwp_makecontext uses for the private area pointer. Normally cpu_lwp_setprivate is provided by MD to reflect the kernel view and enabled by defining __HAVE_CPU_LWP_SETPRIVATE in machine/types.h. cpu_setmcontext is responsible for keeping the MI l_private field synchronised by calling lwp_setprivate as needed. cpu_switchto has to update the mapping. _lwp_setprivate is used for the initial thread, all other threads created by libpthread use _lwp_makecontext for this purpose. (4) Provide __tls_get_addr and possible other MD functions for dynamic TLS offset computation. If such alternative entry points exist (currently only i386), also add a weak reference to 0 in src/lib/libc/tls/tls.c. The generic implementation can be found in tls.c and is used with __HAVE_COMMON___TLS_GET_ADDR. It depends on __lwp_getprivate_fast (see below). (5) Implement the necessary relocation records in mdreloc.c. There are typically three relocation types found in dynamic binaries: (a) R_TYPE(TLS_DTPOFF): Offset inside the module. The common TLS code ensures that the DTV vector points to offset 0 inside the module TLS block. This is normally def->st_value + rela->r_addend. (b) R_TYPE(TLS_DTPMOD): Module index. (c) R_TYPE(TLS_TPOFF): Static TLS offset. The code has to check whether the static TLS offset for this module has been allocated (defobj->tls_static) and otherwise call _rtld_tls_offset_allocate(). This may fail if no static space is available and the object has been pulled in via dlopen(3). It can also fail if the TLS area has already been used via a global-dynamic allocation. For TLS Variant I, this is typically: def->st_value + rela->r_addend + defobj->tlsoffset + sizeof(struct tls_tcb) e.g. the relocation doesn't include the fixed TCB. For TLS Variant II, this is typically: def->st_value - defobj->tlsoffset + rela->r_addend e.g. starting offset is counting down from the TCB. (6) If there is a tp/tcb offset, implement __lwp_gettcb_fast() __lwp_settcb() in machine/mcontext.h and set __HAVE___LWP_GETTCB_FAST __HAVE___LWP_SETTCB in machine/types.h. Otherwise, implement __lwp_getprivate_fast() in machine/mcontext.h and set __HAVE___LWP_GETPRIVATE_FAST in machine/types.h. (7) Test using src/tests/lib/libc/tls and src/tests/libexec/ld.elf_so. Make sure with "objdump -R" that t_tls_dynamic has two TPOFF relocations and h_tls_dlopen.so.1 and libh_tls_dynamic.so.1 have both two DTPMOD and DTPOFF relocations.