Considerations of dynamicism

A list of various characteristics of dynamisism and their tradeoffs. These are a synthesis of the concrete usecases.

Loading time

There are roughly speaking three points at which separately compiled programs can be combined into one program.

Static Linking. Separately compiled components can be linked together as a compile step. In C this is equivalent to .o or .a files. This makes linking-time performance less of a concern, as ideally this step is done once and the resulting binary is distributed. Separate compilation still gives advantages over source compilation. It would also enable the use of proprietary parts of a program, like say a DRM library or a device driver. Adding custom tooling into this step would not be as burdonsome as for the other two, as it is only a development dependency.
Loading on startup. The libraries would either be encdoded in the binary or overwritten using a mechanism like LD_PRELOAD. The time to link becomes more important, as the cost would be paid everytime a program is started. Security also becomes a factor, as loading (and resolving) all symbols at once enables marking the indirection tables as read-only. Dynamism is impacted. The choice of which implementation to load can only be made before launching the program, and only at compile time, modifying the linker search path, or changing which shared objects live on disk. The overhead is paid only once per process, it becomes less of a concern for longer-lived programs
Conversely, the cost has to be paid every startup, and for lazy linking it may cause unpredictable overhead. For this reason, long-lived programs whose startup time is important (like systemd) are moving towards using the third option for loading dynamic libraries.
.
Loading any time. Using the dlopen(3) facility, programs can load a shared library at any time. This library call gives the program a great deal of control over when and which libraries are opened. This also avoids loading facilities entirely if they’re not needed in the specific configuration or invocation. Adding overhead to this step would come at a high cost, as this step could happen at any point in the program. A library loaded in this manner can also be unloaded using dlclose(3), the topic of the next section.

Various platforms also offer their own extensions to dlopen(3).

Glibc adds namespacing support through dlmopen(3). This restricts the symbols in different namespaces from interfering with each other. This feature is also supported by the illumos linker, with an arbitrary number of namespaces, but not in {Free,Open,Net}BSD, nor Mac OS X.
FreeBSD adds the fdlopen(3) and dlvsym(3) functions to give more control over which shared object is referenced. Neither of these facilities are available on other Unix-likes, not even Mac OS X.
- fdlopen takes in a file descriptor rather than a name or path.
- dlvsym takes in an explicit version that the library must match
  The versioning of .so files is decidedly limited. The major version must match exactly (like in semver). If the minor version of the library is smaller than the one specified by the program, a warning is printed and execution continues.
  .

Unloading

To complement dlopen(3), dlclose(3) is provided to unload an object. This is an attractive facility for plugin-like systems, or to switch out which implementation is used. Unloading an object does present a challenge to safety, as referencing symbols of an unloaded library is undefined behavior.

POSIX already ensures that the libraries are reference counted, giving some measure of assurance. POSIX also allows runtimes to The libloading crate adds a lifetime parameter to all symbols loaded from a shared object to try to avoid dangling references.

Global variables are especially problematic in the context of unloading modules. Not only must the global not be referenced by any code not unloaded, the proper destructors must also be called in the right order. Generally, destructors are called in the opposite order of the constructors, but this ordering becomes impossible in the thread-local case. A thread-local should live as long as the thread, and be destructed when the thread returns. A thread-local should live as long as the module from which it comes, and be destructed when the module is unloaded. These two requirements are irreconcilable without a run time garbage collector. If the thread-local is destructed when the module is unloaded, every thread loses the variable which they assumed was present. If the thread-local is destructed at the end of the thread, the destructor no longer available to be called. For this reason, many runtimes refuse to unload libraries with thread-local destructors.

Care should also be taken that if a destructor creates a new global variable (through say a static c++ variable), the destructor of the new function-scoped global should also be called. Most runtimes handle this correctly, but there are the occasional bugs.

Types

For idiomatic Rusty library interfaces, we would like to enable as many types as possible to cross the dynamic boundary safely. The stability of such types is discussed in the crABI proposal, but even with only #[repr(c)] we can already support many use cases.

The shape of the type

“Uninterpreted” bytes: many utility libraries have interfaces of (pointers to) raw bytes and a length parameter. This would not allow various commonly occurring wrapper types, which will be discussed later.
Structs with only public fields. By restricting the interface to only public fields, we can guarantee that the fields contain no safety invariants
At least as long as they aren’t marked unsafe, which is not possible as of the time of writing.
. The C ABI provides a stable layout, and the fields can be included in the type identifier. Adding, reordering, removing, or indirectly changing a field would break ABI compatibility
Many C/C++ libraries will add a buffer to the end of their structs in order to be able to add fields without breaking compatibility. This would still not be fully safe if the library consumer is able to create the structs, and all zeroes is not a valid habitation of the field. It would also present issues for any hashing algorithm, as only strict equality can be checked.
, even if they don’t always break API compatibility.

Private fields with safety requirements. The #[export] RFC defines a mechanism for library authors to specify an invariants “hash” that is part of the unsafe promise. This comes with the downside that any change to any function writing to those fields potentially requires updating the hash.
Opaque structs behind pointers, scoped per library. The internal representation and safety requirements can be changed at will without compromising ABI compatibility. Interacting with the struct would come with performance overhead, as even field accesses would need to go through a setter found in the GOT. This overhead would only occur when control flow crosses the dynamic boundary, a shared object can still perform all optimizations internally
This is the same approach that Swift took.
. Passing the opaque struct between loaded dynamic libraries is also not possible without extra machinery.
- There is exactly one instance of the library providing the opaque type, all other libraries that want to use the type must pass it to that instance.
- The type is attached with a similar hash as the in the private fields case. This hash could be generated from the library version, giving a conservative but safe bound.

Generics and Traits

Generics and their trait bounds can appear in two places, when constructing a new type or when declaring a function. Rust monomorphises all generics at compile time, meaning that a library would need to provide all instantiations that a user would like to call. An escape hatch is provided through trait objects. Deciding which forms can cross the linked boundary will affect both the expressiveness and performance.

Only concrete types. Even when not allowing any generics, many interfaces still work. Many mathematical/cryptographical libraries, bindings to C/C++ code, and system bindings often primarily use concrete types. Additional support could be added to produce various monomorphisations of generically written code, either through a language feature or a macro. For generics with a limited number of inhabitants, such as variously sized numbers or rendering surfaces, this approach works well, permitting many optimizations. Unfortunately, creating a symbol for every combination does not scale well, as every possible combination needs to be included in the shared object.
Allow generics, possibly bounded over traits, but without trait methods. More flexibility can be obtained by allowing generic bounds, but restricting the use of trait methods. A small amount of complexity is added to the implementation, as the types inhabiting the generic bound may differ in size, alignment, and Drop implementation
The drop implementation is also relevant for types without custom drop logic, as the allocator used in one library may differ from the one used in a different library. Freeing in the wrong allocator is usually undefined behavior.
. Many collection types can be implemented within these restrictions, like Vec or LinkedList. Other collections require only a few methods, like Eq, Ord, or Hash. Supporting these more complicated collections is explored in the next point.
Provide methods statically. Instead of wrapping all entries in a trait object, we can pass those functions when they are needed. Passing a table
Passing a single table containing the traits methods is likely preferable over “splatting” the entries.
as an extra argument with every function call
Contrasted to passing the methods when the collection is created. By passing it on every function call, we also allow generic functions that aren’t methods on an object. Moreover, doing so would not reduce the overhead, as the table can be constructed at compile time (or even link time if required), and only the pointer to it needs to be copied.
seems to be the best option for this. This table would need to have a stable format. In an ideal world, the table would also be stable when the trait gains methods. Such stability could also be fabricated by constructing the tables when a library is loaded.
Only <dyn Trait>. Another possibility is to only allow trait objects to cross the library boundary. This adds significant overhead, as every object would need to store (a pointer to) its own vtable. It would not require any additional language features, except a stability guarantee of the vtable format. The downside is that only dyn-compatible traits could cross the boundary.
Extend <dyn Trait> to include multiple traits. Currently, trait objects have two additional restrictions compared to generics. Dyn-compatibility cannot be relaxed without compromising safety. The other restriction, that a trait object can only be bound by one trait, can be eased. Currently, it can be emulated by constructing a new trait bounded on the required traits. Adding support for this may be desired if the subtrait way is deemed to unergonomic.
JIT monomorphise all generic methods. This has a very high link time overhead, and is thus unfeasible in many cases. This could be used in cases with static linking, where a compiler is likely to be available anyway. The (non-dynamic) library object would need to ship an uninstantiated version of the struct/method that can be monomorphised at link time.

In cases two and three, the types on the caller and callee side would be different. This could still be done in a type safe manner. The caller side pretends that a method takes a concrete type, while the callee side only sees the dynamic objects

This is similar to how languages like Java implement all generics. A collection like ArrayList<T> stores Objects which are cast to the concrete T by the caller.

. This would require language support, as currently traits like Ord are not dyn compatible.

Globals and thread-locals

The semantics of global variables in the presence of dynamic loading has tradeoffs beyond safety and performance. There are two sets of behaviors that offer incompatible programming semantics.

In situations like Rubicon, globals should be scoped to the whole process. This makes the dynamically linked whole act as if it were compiled together. Namely, it enables loading multiple instances of the tokio runtime that act as one runtime. It would also be required for dynamically loading libraries that hook into other libraries through globals, such as tracing.

In other cases, it is preferable to have globals be scoped by the shared object they belong to. For example, a process may intentionally load multiple instances of an interpreter that do not interfere with each other. The isolation of globals also becomes important if multiple, incompatible versions of a library are loaded, possibly as indirect dependencies.

A synthesis of both of these requirements is to let the process choose on a library basis. Every shared object would export a list of globals that it desires. The loader would then either allocate new space for those globals to reside in, or it deduplicates them with existing globals. In the general case, this would require including a larger runtime with every program that wants to dynamically load libraries. Platform specific extensions like namespaces can also be used to implement this. The glibc implementation is limited to sixteen namespaces, which may be too restrictive for some use cases.