Symbols, what and where.


Last modified on 2025-04-03

A note on platforms

This section contains a lot of platform specific explanations. These sections are delimited, and can be closed by clicking on them. The groupings are as follows: Glibc, running on presumably Linux;🭮▊ Only available on FreeBSD;🭮▊ Only available on OpenBSD;🭮▊ Available on FreeBSD, OpenBSD, and NetBSD, DragonflyBSD is possible, but not guaranteed;🭮▊ Only on illumos, and likely (Open)Solaris;🭮▊

Dynamic symbols have a strict order.

[!IMPORTANT] This is for Unix and Unix-likes, Windows works differently, which I will write about in a different section.

Dynamic symbols, whether weak or strong, are given an entry in the PLT/GOT for functions and data respectively. Additional information is provided to tell the loader which slot corresponds to which symbol. These symbols are resolved at startup time or lazily when the symbol is accessed. In either case, the decision for which symbol is chosen is the same.

When resolving a symbol reference, the loader will traverse the shared objects it has loaded in a fixed order. Most options accept either a full path or only the name. If only a name is provided, the loader searches the list of directories in the same order to find the shared objects. A shared object dependency specified by name can be resolved in a directory specified later in the list.
  1. A system administrator can modify the list by prepending alternative locations.
    1. the LD_PRELOAD environment variable;
    2. the --preload command line option (if ld.so is invoced directly rather than as the “interpreter” for the ELF binary);
    3. and lastly the /etc/ld.so.preload file, if it exists.
  2. Every executable defines a list of libraries that is wants the loader to provide in DT_NEEDED.
  3. The DT_RPATH section of the executable defines the search path specific to that executable. This can be disabled using the --inhibit-rpath command-line option.
  4. The LD_LIBRARY_PATH environment variable. Except for secure execution mode. The --library-path command-line option can be used to override the environment variable.
  5. The DT_RUNPATH ELF section. Note that this search path only applies to direct DT_NEEDED dependencies. This is not used to resolve the dependencies of dependencies, or for dlopen and friends. This can be disabled using the --inhibit-rpath command-line option.
  6. The /lib and /usr/lib folders. 64-bit architectures may also search the /lib64 and /usr/lib64 paths.
  7. An object opened with dlopen(3) with the RTLD_GLOBAL flag set. The RTLD_GLOBAL flag will put the symbols exported by the library into the global namespace. As these symbols are put at the end of the namespace, only weak symbols will be resolved into the library. The only likely time that symbols will be resolved into these objects will be if one object depends on a symbol exported by an earlier dlopen call. This can be avoided using RTLD_DEEPBIND.

A name will be resolved to the first lib[name].so* found according to the search order. If a version is specified, the shared object must match the major version, i.e. lib[name].so.3*. If the desired minor version is greater than what the object provides, a warning is emitted. In some cases, an object may have different versions depending on different hardware capabilities. The capabilities can be appended to the filename (eg. libcrypto.so.3.5.avx2), which will only be considered if the current platform supports the capability.🭮▊ Such objects are placed in directories corresponding to the required capability set, which are searched accordingly.

Auditing and namespaces.

Both glibc and illumos have a separate namespace that enables the “auditing” of loader behavior. Is probably not interesting from a language perspective, unless it could be (ab)used to do things like runtime lifetime verification.

Also, it seems like the glibc implementation (besides limiting to 16 namespaces) has some bugs that the developers don’t care about resolving.

dlopen

A call to dlopen takes either a name or a file path, in addition to configuration flags. Unless it is given a full path, dlopen will search the same directories as above. fdlopen can open a file descriptor. This can prevent TOCTOU, and works together with Capsicum capabilities.🭮▊ The handle returned by dlopen can be used to programmatically interact with the loaded object. There are three main uses for this handle: dlsym, dlinfo, and dlclose. Closing a shared object is complicated, and will receive its own section.

The configuration flags are as follows.
  • RTLD_LAZY or RTLD_NOW: all references required by the object should be resolved immediately or lazily.
  • RTLD_GLOBAL or RTLD_LOCAL: if the symbols should be available globally or only through dlsym and the returned handle.
  • RTLD_NODELETE prevents the deletion of the address space upon dlclose.
  • RTLD_NOLOAD: only return a handle if the object is already loaded. Such a call can also “promote” the flags of an existing dependency.
  • RTLD_FIRST: When the returned handle is used with dlsym, only search the first object associated with the handle🭮▊
  • RTLD_TRACE: Print to stdout all objects needed by the referenced shared object and exit the program. Only returns on error.🭮▊
  • RTLD_DEEPBIND: All symbol searches originating in the object will start with the objects dependency chain rather than the global searchspace.🭮▊
The scope in which the dlopened object will search to resolve symbols can be modified with the following flags:🭮▊
  • RTLD_GROUP: Do not search outside the objects own dependencies.
  • RTLD_PARENT: As above, but also make the symbols of the object calling dlopen available. The parents symbols are not available through dlsym.
  • RTLD_WORLD: Only symbols of objects opened with RTLD_GLOBAL are made available.

dlsym

A call to dlsym takes a handle, or a pseudo-handle, as well as a symbol name. The matching rules for symbols are the same as in other dynamic loading scenarios.
  1. NULL is interpreted as the current shared object.🭮▊
  2. RTLD_SELF starts the search in the current object, also searching the global namespace.🭮▊
  3. RTLD_DEFAULT searches the same path as normal symbol resolution.
  4. RTLD_NEXT searches the path after the current object. This is useful to interpose on a library and wrap the actual function.
  5. RTLD_PROBE searches only the symbols in the currently loaded objects, as well as explicitly identified dependencies🭮▊

dlvsym has the same behavior as dlsym, except it takes an explicit version string🭮▊

The return type of dlsym is a void pointer. If said pointer is NULL, it may indicate an error or it may be the genuine result of the symbol. A call to dlerror needs to be made to interrogate the actual error state.

Matching symbols

TODO: finish

All symbols that conspire to uphold safety guarantees on the same data MUST uphold the same guarantees. Probably means that they MUST resolve out of the same object, or predefined hierarchy. Includes globals/thread-locals.

Tradeoff in symbol string:
  • Mangeling guarantees (probably) type safety. Requires the caller to mangle the identifyer in the same manner (either compiler support or macro). Fragile to any change in scheme/type. Makes lifetimes harder.
  • Unmangled names are easy to use, but cannot guarantee type information. A wrapper around dlsym could be used to recover the information (reify type info into reachable struct). Runtime overhead

Global globals or crate-local

Some systems want their loaded dependencies to use the same globals as the caller, others want them separated. For libraries loaded as “normal” dependencies (i.e. LD_PRELOAD), this must be decided by the loaded object. A caller may desire to control this itself, or share globals between objects in arbitrary clusterings. Some globals also need to be shared with the whole process to maintain safety, such as locks guarding non-MT-SAFE resources. Both sharing and not sharing globals come with important safety invariants.

Shared globals

Sharing globals is desirable if the loaded library is supposed to run within the same context as the caller, such as sharing a tokio runtime.
  • All functions touching the global must uphold the exact same invariants, even across dependencies. This means that unless the “ABI version” is explicitly specified, every change should be assumed to be incompatible.

  • The symbols should resolve to the same address, even if it is defined multiple times This is guaranteed with LD_PRELOAD and can be guaranteed by passing the correct flags to dlopen.
  • The globals should be available before the shared object is loaded . Can be achieved by loading an .so genenerated by --export-globals with the RTLD_GLOBAL flag.

The crate is the globe

Not sharing globals: plugin system that loads many independent plugins
  • In native dlopen, opening a dependency twice does not create two copies, and the globals are also not duplicated. If this would be desired, a “factory” would need to be made to allocate and initialize the globals. Native behavior is the same as require in Lua.
  • Certain globals should still be shared, such as locks on system resources.
  • If the allocator differs between objects, then pointers allocated by one object must not be deallocated by another.

On generics