Last modified on 2025-04-03
A note on platforms
This section contains a lot of platform specific explanations. These sections are delimited, and can be closed by clicking on them. The groupings are as follows: Glibc, running on presumably Linux;🭮▊ Only available on FreeBSD;🭮▊ Only available on OpenBSD;🭮▊ Available on FreeBSD, OpenBSD, and NetBSD, DragonflyBSD is possible, but not guaranteed;🭮▊ Only on illumos, and likely (Open)Solaris;🭮▊Dynamic symbols have a strict order.
[!IMPORTANT] This is for Unix and Unix-likes, Windows works differently, which I will write about in a different section.Dynamic symbols, whether weak or strong, are given an entry in the PLT/GOT for functions and data respectively. Additional information is provided to tell the loader which slot corresponds to which symbol. These symbols are resolved at startup time or lazily when the symbol is accessed. In either case, the decision for which symbol is chosen is the same. When resolving a symbol reference, the loader will traverse the shared objects it has loaded in a fixed order. Most options accept either a full path or only the name. If only a name is provided, the loader searches the list of directories in the same order to find the shared objects. A shared object dependency specified by name can be resolved in a directory specified later in the list.
-
A system administrator can modify the list by prepending alternative locations.
- the
LD_PRELOADenvironment variable; - the
--preloadcommand line option (ifld.sois invoced directly rather than as the “interpreter” for the ELF binary); - and lastly the
/etc/ld.so.preloadfile, if it exists.
- the
-
Every executable defines a list of libraries that is wants the loader to provide in
DT_NEEDED. -
The
DT_RPATHsection of the executable defines the search path specific to that executable. This can be disabled using the--inhibit-rpathcommand-line option. -
The
LD_LIBRARY_PATHenvironment variable. Except for secure execution mode. The--library-pathcommand-line option can be used to override the environment variable. -
The
DT_RUNPATHELF section. Note that this search path only applies to directDT_NEEDEDdependencies. This is not used to resolve the dependencies of dependencies, or fordlopenand friends. This can be disabled using the--inhibit-rpathcommand-line option. -
The
/liband/usr/libfolders. 64-bit architectures may also search the/lib64and/usr/lib64paths. -
An object opened with
dlopen(3)with theRTLD_GLOBALflag set. TheRTLD_GLOBALflag will put the symbols exported by the library into the global namespace. As these symbols are put at the end of the namespace, only weak symbols will be resolved into the library. The only likely time that symbols will be resolved into these objects will be if one object depends on a symbol exported by an earlierdlopencall. This can be avoided usingRTLD_DEEPBIND.
name will be resolved to the first lib[name].so* found according to the search order.
If a version is specified, the shared object must match the major version, i.e. lib[name].so.3*.
If the desired minor version is greater than what the object provides, a warning is emitted.
In some cases, an object may have different versions depending on different hardware capabilities.
The capabilities can be appended to the filename (eg. libcrypto.so.3.5.avx2), which will only be considered if the current platform supports the capability.🭮▊
Such objects are placed in directories corresponding to the required capability set, which are searched accordingly.
Auditing and namespaces.
Both glibc and illumos have a separate namespace that enables the “auditing” of loader behavior. Is probably not interesting from a language perspective, unless it could be (ab)used to do things like runtime lifetime verification. Also, it seems like the glibc implementation (besides limiting to 16 namespaces) has some bugs that the developers don’t care about resolving.dlopen
A call todlopen takes either a name or a file path, in addition to configuration flags.
Unless it is given a full path, dlopen will search the same directories as above.
fdlopen can open a file descriptor. This can prevent TOCTOU, and works together with Capsicum capabilities.🭮▊
The handle returned by dlopen can be used to programmatically interact with the loaded object.
There are three main uses for this handle: dlsym, dlinfo, and dlclose.
Closing a shared object is complicated, and will receive its own section.
The configuration flags are as follows.
-
RTLD_LAZYorRTLD_NOW: all references required by the object should be resolved immediately or lazily. -
RTLD_GLOBALorRTLD_LOCAL: if the symbols should be available globally or only throughdlsymand the returned handle. -
RTLD_NODELETEprevents the deletion of the address space upondlclose. -
RTLD_NOLOAD: only return a handle if the object is already loaded. Such a call can also “promote” the flags of an existing dependency. -
RTLD_FIRST: When the returned handle is used withdlsym, only search the first object associated with the handle🭮▊ -
RTLD_TRACE: Print tostdoutall objects needed by the referenced shared object and exit the program. Only returns on error.🭮▊ -
RTLD_DEEPBIND: All symbol searches originating in the object will start with the objects dependency chain rather than the global searchspace.🭮▊
The scope in which the dlopened object will search to resolve symbols can be modified with the following flags:🭮▊
RTLD_GROUP: Do not search outside the objects own dependencies.RTLD_PARENT: As above, but also make the symbols of the object callingdlopenavailable. The parents symbols are not available throughdlsym.RTLD_WORLD: Only symbols of objects opened withRTLD_GLOBALare made available.
dlsym
A call todlsym takes a handle, or a pseudo-handle, as well as a symbol name.
The matching rules for symbols are the same as in other dynamic loading scenarios.
-
NULLis interpreted as the current shared object.🭮▊ -
RTLD_SELFstarts the search in the current object, also searching the global namespace.🭮▊ -
RTLD_DEFAULTsearches the same path as normal symbol resolution. -
RTLD_NEXTsearches the path after the current object. This is useful to interpose on a library and wrap the actual function. -
RTLD_PROBEsearches only the symbols in the currently loaded objects, as well as explicitly identified dependencies🭮▊
dlvsym has the same behavior as dlsym, except it takes an explicit version string🭮▊
The return type of dlsym is a void pointer.
If said pointer is NULL, it may indicate an error or it may be the genuine result of the symbol.
A call to dlerror needs to be made to interrogate the actual error state.
Matching symbols
TODO: finish All symbols that conspire to uphold safety guarantees on the same data MUST uphold the same guarantees. Probably means that they MUST resolve out of the same object, or predefined hierarchy. Includes globals/thread-locals. Tradeoff in symbol string:- Mangeling guarantees (probably) type safety. Requires the caller to mangle the identifyer in the same manner (either compiler support or macro). Fragile to any change in scheme/type. Makes lifetimes harder.
-
Unmangled names are easy to use, but cannot guarantee type information.
A wrapper around
dlsymcould be used to recover the information (reify type info into reachable struct). Runtime overhead
Global globals or crate-local
Some systems want their loaded dependencies to use the same globals as the caller, others want them separated. For libraries loaded as “normal” dependencies (i.e.LD_PRELOAD), this must be decided by the loaded object.
A caller may desire to control this itself, or share globals between objects in arbitrary clusterings.
Some globals also need to be shared with the whole process to maintain safety, such as locks guarding non-MT-SAFE resources.
Both sharing and not sharing globals come with important safety invariants.
Shared globals
Sharing globals is desirable if the loaded library is supposed to run within the same context as the caller, such as sharing atokio runtime.
- All functions touching the global must uphold the exact same invariants, even across dependencies. This means that unless the “ABI version” is explicitly specified, every change should be assumed to be incompatible.
-
The symbols should resolve to the same address, even if it is defined multiple times
This is guaranteed with
LD_PRELOADand can be guaranteed by passing the correct flags todlopen. -
The globals should be available before the shared object is loaded
.
Can be achieved by loading an
.sogenenerated by--export-globalswith theRTLD_GLOBALflag.
The crate is the globe
Not sharing globals: plugin system that loads many independent plugins-
In native
dlopen, opening a dependency twice does not create two copies, and the globals are also not duplicated. If this would be desired, a “factory” would need to be made to allocate and initialize the globals. Native behavior is the same asrequirein Lua. - Certain globals should still be shared, such as locks on system resources.
- If the allocator differs between objects, then pointers allocated by one object must not be deallocated by another.