Proposal for symbol lookup in the linker

James · Post by **James** » 2024-01-28, 19:00:46

Resolving names can take an unacceptable amount of time, and are one of the reasons linking happens lazily on linux, leading to read+executable PLT sections etc..
Mangled names can be huge and the standard technique will always do at least one full strcmp.
It seems desirable to eagerly bind all symbols efficiently then set executable segments to readonly

Idea: Object files provide a hash function and guarantee there are no local hash collisions

Hash function is not fixed

A symbol lookup now takes constant time

Code: Select all

LibSymIndex sym = libHashTable[hash >> libHashTableSz]

Hashes are cached by the object file; However, objects requesting external symbols may, If the symbol provider is unknown and the candidates have different hash functions, have to hash the symbol (again) using the different hash function.
Hash conflicts happen at the higher and rarer inter-library granularity
When loading a list of libraries the linker may merge hashtables (that use the same hash function), while dealing with hash conflicts

Questions:
Must we force the namespace to be flat
Should we also hash library names
Default hash function (googles abseil hash + cwiss hashtable is a candidate, apache 2 license)
Symbol versions

Post by **agner** » 2024-01-29, 5:51:11

Name hashing was actually used in the old OMF object file format used in DOS. This format was not very successful (mostly for other reasons). Some systems today use unsorted name tables, while ELF files often have both a sorted and an unsorted name table. ForwardCom is using sorted name tables and binary searching. I think this is quite efficient.

A strcmp call will not read the whole string, but stop when the first non-matching character is found. This is fast even though ForwardCom has no limit on name length.

A hashing algorithm has a large overhead, which makes binary searching faster except for very large name tables.
Another problem is that the hashing algorithm must be specified in every detail to avoid ambiguity. The definition of the hashing algorithm in the OMF format was somewhat sloppy.

Anyway, my experience is that static linking is always fast in all the systems I have worked with, while dynamic linking can be slow. ForwardCom is using static linking wherever possible, and it has an innovative relinking feature to avoid dynamic linking. ForwardCom has no PLT. The ForwardCom linking system has been designed to avoid the problems with slow dynamic linking.

Post by **agner** » 2024-01-30, 14:11:52

In my experience, dynamic linking is slow, not because of symbol lookup, but because the libraries are many and large. It is not uncommon for an application to load ten DLLs of 1 MB each, and use only 1 kB of each. It will load at least one memory page from each DLL, and usually more. This is a waste of memory and leads to poor caching because the library functions are scattered around. The chances for sharing library code between multiple running programs is low when different programs use different library versions. ForwardCom can share library code between multiple instances of the same program, but not between different programs.

James · Post by **James** » 2024-01-30, 14:46:04

[redundant text removed]

I'm not disputing that copying only the parts of a library used into memory can sometimes improve the caching and theoretically with the kernel's help even bypass the mmap tables (This really needs to be measured though as sharing also sounds optimal in many cases), but even with this plan in mind we have to find the location of the symbols.

Support for loading many libraries with different versions is certainly high priority; the elf symbol versioning mechanism is a perhaps underused feature that allows libraries to be updated with breaking ABI changes whilst remaining backwards compatible.

Perhaps we should let go of the 1:1 mapping between code and names, this idea barely holds together in the flat namespace dealt with by the loader and the mangled names that encode type information, not to mention the additional functions that compilers may emit. I also believe multi-entrypoint functions should be fully supported and those can't be canonically named either.

James · Post by **James** » 2024-01-30, 15:35:03

Let's re-examine the question from the top:

Suppose I am an object that requires 'malloc'. The Elf solution to this is for the object to list all libraries that may contain the imported symbols (DT_NEEDED), then have relocations either to PLT or to text with eager binding and relocate-readonly enabled.
The only thing I can count on is receiving a 'malloc' symbol, there is no guarantee it will have the correct ABI, let alone conform to a more descriptive type or my desired behavior.

Thinking about this, the only approach that makes sense is for object files to list their desired library+symbol+version (perhaps encoded as an Int), in a way that lines up with the Libraries API when it was compiled. This still allows libraries to evolve (ie. by simply appending symbols and assigning them their index) providing they promise to provide at least a backwards compatible symbol with the same behavior.
There isn't a good reason to depend on human readable string names that don't map cleanly or consistently to the symbols (multiple entry functions, specialisations, lifting, partial applications, namespaces): what matters is whether the symbol is binary compatible and behaves as expected.

Post by **agner** » 2024-01-30, 15:58:58

Multi-entrypoint functions are possible in assembly code, but cannot be coded in high level language. They can be called from HLL code, though.

I think you are asking for trouble if you want partially compatible libraries with multiple versions of the same function. ForwardCom tries to solve the problem with library versions by the relinking feature. An executable file is always distributed with the necessary compatible library functions included. In case a user wants to update a 3rd party library function without updating the program that uses it, he can relink the executable file with the new version of the library. If the result is not satisfactory, he can return to the old version. It is possible to update the library function in one application and at the same time keep the old version of the same function in another application. This is not possible in Windows. It is possible in Linux, at the cost of having many versions of the same library (and sometimes you don't have exactly the version that the application demands)

James · Post by **James** » 2024-01-30, 16:05:31

"Multi-entrypoint functions are possible in assembly code, but cannot be coded in high level language"
This is obviously dependent on the high level language. In any event languages are more likely to want to do this automatically.

Partially compatible libraries with multiple versions of the same function is a fantastic feature of Elf files. Having to duplicate entire libraries to update a few functions is not practical. The whole point I made above is about strengthening the link between compiled library API and symbols requested for dynamic linking.

I don't know what happened with the "[redundant text removed]" but I strongly disagree so am reposting:

The performance of hashing is not to be understated, Elf objects use a hashing table in the dynamic section: DT_HASH, which was extended by GNU with a DT_GNU_HASH that uses a better hashing function and this alone lead to measured improved linking speeds of up to 50%, The algorithm complexity and practical time taken is not in the same league as dichotomy strcmps.

Dynamic linking can't be swept under the rug, we've seen that Apple and their swift language are massively invested in reducing memory footprints of live programs for power and general efficiency: They want efficient use of the instruction cache and for as many programs as possible to share code, which often means huge graphical libraries on top of the standard ones. This is important for their large suite of battery powered devices. The C++/rust ideas of "more code = faster" feels more like a cover up for their failures of dynamic linking. Whatever its faults, swift is a champion of ABI stability and dynamic linking.

My view is that static linking should be entirely the responsibility of compilers, who will have their own database of code, extra type annotations, polymorphic instantiations and specialisations to guide linking in an informed way. Modern compilers ship that want to ship with a JIT and must handle static linking in an implementation dependent way anyway. The failure of link-time optimisations at the Elf level is a good indication that static linking at that level is misguided.

Dynamic linking is more flexible but cannot assume much about the objects being linked especially when libraries are dynamically via dlopen. The speed of dynamic linking depends almost entirely on symbol lookup. If library and object file versions were fixed they could ahead of time translate symbols to an integer naming convention, unfortunately the desire to support updating either participant independently and avoid a central naming authority complicates this.

Post by **agner** » 2024-01-31, 6:51:29

I don't know what happened with the "[redundant text removed]"

You posted the same text twice, that's why I removed it.

James · Post by **James** » 2024-05-02, 21:35:35

It appears that partially copying library functions quickly becomes difficult as soon as any of them depend on each other.
As a quick example; `f = .. ; g = f ; h = f;` If you want to use only 'h', due to relative offsets you are probably forced to copy g since selectively copying and relinking only f and h seems prohibitively expensive.

No doubt the layout of the library may allow slicing in some cases, but not all. The data and code section split (for mmaping permissions) generate a lot of these interdependent relative offsets.

Post by **agner** » 2024-05-03, 5:08:59

James wrote

`f = .. ; g = f ; h = f;`

No problem. Each function is supposed to have its own module. The linker will load only the necessary modules.

Do you mean g and h are aliases for f, or they are calling f? If they are aliases, they can be in the same module without any costs. If they are calling f, they should each have a separate module.

It is possible to put multiple functions in the same module and to have a function with multiple entries, but this requires assembly programming. In fact, you can do almost anything in assembly. A programmer who does unconventional things like this is obviously doing it on purpose for some reason and is responsible for any performance consequences.

James · Post by **James** » 2024-05-03, 11:01:17

'Module' doesn't match much in the manual

The idea to relink each function and all their transitive dependencies dynamically is potentially very expensive, especially as compared to the almost free copying in of libraries; Granted this wastes some memory but I take that over complicated CPU intensive tree-walking relinking.

Since multi-entry functions aren't being respected, I'm going to provide a strong motivation: Specialisation of partial applications.
As a dumb example, Given `fromMaybe d = \case { One x = x ; None = d }` the compiler must be able to simplify `fromMaybe (Just 5)` to 5, ie. the rhs of the first branch. Generally (case matrixes , polymorphic functions that need an address , JIT) it benefits massively from being able to bypass the branching directly.

See the 2007 paper on "Stream fusion" which illustrates the importance of specialisation.

I'm also experimenting with letting functions decide their own calling convention and then prepending a header/second entry point using the system CC in case it is used polymorphically or by FFI

Post by **agner** » 2024-05-03, 14:22:30

A module is what comes from an object file. A library can contain multiple modules, each containing a function.

Relinking is done statically, not dynamically. The executable file is relinked when one or more library functions need updates. The updated executable file is stored and used again and again until perhaps another update is needed.

Dynamic linking is used only if it is not known in advance which functions are needed.

CPU power is cheaper than memory access. Therefore, it is not advisable to put in more functions in order to save CPU power.

I don't understand your 'maybe' example.

Forwardcom has special features for storing efficient jump tables and virtual function tables.

James · Post by **James** » 2024-05-03, 19:34:31

"A module is what comes from an object file. A library can contain multiple modules, each containing a function." This definition isn't very rigorous and still unclear.

I wasn't talking about static linking, which is rather standard but not exactly a good solution, known to bloat programs (cf. rust executables). It's really not on to duplicate things like printf and graphics libraries into every executable.
If that really is the intention than my questions become irrelevant. However the manual mentions "run-time linking. The required function is extracted from the library and loaded into memory, preferably at a memory space reserved for this purpose by the main program". In which case all mentioned problems resurface.

The point about dynamic linking only being used when the specific function is unknown sounds insane: How is a program supposed to behave correctly without knowing what it expects from its dependencies? Most likely that's an optimization to load only necessary functions, ELF PLT style.

The Maybe example motivates multi-entry functions by opportunistically statically bypassing a jump table header.

The comment about CPU power misses the point, there are 2 phases: loading/dynamic linking and then calling functions. wasting memory there has little impact on the program's performance, unlike the considerable upfront cost of dynamically compressing a slice of a library.

Post by **agner** » 2024-05-04, 4:57:25

Dynamic linking also leads to bloat. DLL's and .so's are often very big, while only a very small fraction of them are actually used by a typical program. Dynamic libraries in traditional systems are scattered around in memory. This leads to excessive load time for a program that uses many libraries, and to memory fragmentation. There is a lot of performance to gain by avoiding this fragmentation.

System functions are not linked into executables, but accessed through a system call.

Dynamic linking is used for example by an application that has a command interpreter, and the user tells the command interpreter to load a particular library after the application has started. Static linking is used if it is known what library functions to use before the executable is loaded.

There is no jump table header in libraries. Library functions are always accessed by their final address.

James · Post by **James** » 2024-05-04, 12:34:01

"only a very small fraction of (DLLs) them are actually used". This is remedied by splitting them.

The aspect of this project I appreciate the most is its uncompromising nature: For the most part I can really believe in the longevity of the solutions presented. And then.. here it is insisting on static linking.. it's sad.

Since you keep mentioning syscalls I'll clarify that I completely agree, even at present it's possible to syscall directly, anyway that's minor compared to things like heavy graphical functions.

There may be a lot of performance gain as you say, but have you measured it? Have you explored improving library layouts and mitigating the current weaknesses of the loader? Since supporting dlopen is important then it will be explored and optimized anyway, and to top it off you aren't even forcing the choice of static vs dynamic on compilers; Both options are supported.

"There is no jump table header in libraries. Library functions are always accessed by their final address"
You misunderstood again. This shows one need for multi-entry functions: Some functions start with a branch or jump table, I want to be able to call one or other branch statically if I statically know the structure that will be branched on. The point of that is to avoid call indirection in situations when the jump table is used and we can exploit fallthrough or tail calls.

The bottom line is the current state of dynamic linking on linux is indeed abysmal, but I contend it is not a problem inherent with dynamic linking and would love to take a closer look at improving that system, Indeed I've already made some suggestions in this thread, but if the reaction continues to be this hostile its hard to remain motivated.

Post by **agner** » 2024-05-04, 15:14:32

I don't intend to be hostile. I just disagree. This project is intended to inspire discussions.

The ForwardCom libraries already contains function modules with multiple entries. For example, the sin, cos, and tan functions are all in the same module and using the same code.

I still believe that static linking is more efficient than dynamic linking, and it avoids compatibility problems. Executable files become bigger, but they are still way smaller than many data files so they will use only a small fraction of the disk space.

No compiler is available yet.

James · Post by **James** » 2024-05-04, 17:13:44

Alright, I see the need to take the static vs dynamic linking debate from the top.
In the first place, this is still frequently debated, to give my conclusion upfront, I'm not in favor of imposing a decision one way or another. My opinion here is the compiler should maintain a database of functions and choose whether or not to inline them.

"They will only use a small fraction of disk space" is dodgy reasoning, yes computers are improving but that doesn't obviate the need to care about efficiency. Disk space we can perhaps disrespect, however space in the instruction cache is precious.

The modern champion of dynamic linking is probably swift, since apple cares a lot about memory, instruction cache and power efficiency for their mobile devices, you can ask them how valuable they think the extra cache hits are. At the core of all of this is ABI stability: that is where C fails miserably (yes types designed to be wobbly are indeed wobbly and unpredictable) and design failures there is why dynamic linking can generate so many compatibility problems, especially now that C has become the lingua franca for OS and inter-language communication. However dlls also give the OS and protocols the flexibility to evolve: they allow you to drop support for old protocols and introduce less intrusive updates. The point of huge dynamic libraries taking up space is mitigated somewhat by the fact functions are page-faulted in when needed. Something I initially misunderstood about forwardcom was the idea it would sacrifice some startup time to serialize needed functions into it's memory space (turns out this is actually a thing in iphones called 'order files' which say what should be loaded in what order to optimize the layout and avoid page-faulting later).

The points for static linking are generally less convincing, it's easier, forces a standard environment and binaries have faster startup times. Cross module optimizations and general inlining are not a point in favor since that can only be handled intelligently before lowering to assembly. The champion of static inking is likely rust, there compilation times are truly absurdly slow and binaries are painfully bloated, particularly since rust monomorphises everything. Transitive dependencies generate tons of copies: You wanted a banana, instead you got a gorilla holding a banana and also a tree rooted in the forest.

The takeaway from all of this is "static linking is a concern for compilers, let them inline things as they choose", dynamic linking and ABI stability is the responsibility of the system.

Post by **agner** » 2024-05-05, 6:28:33

James wrote:

I see the need to take the static vs dynamic linking debate from the top.

Thank you.

the compiler should maintain a database of functions and choose whether or not to inline them

Agree. No problem.

Disk space we can perhaps disrespect, however space in the instruction cache is precious.

Cache optimization was indeed a strong argument for static linking in ForwardCom. Static linking makes all code contiguous, assuring optimal caching and no unused code. Dynamic linking makes functions scattered everywhere in memory, including many unused functions, making caching inefficient.

ABI stability

ForwardCom is standardizing ABI and calling conventions across platforms and languages, unlike most other instruction set architectures.

functions are page-faulted in when needed

This is causing unpredictable response times that annoy users. The heavily fragmented memory space has caused a need for 5-level page tables in Windows - with a heavy cost on hardware, software and performance. I think it is time to experiment with reversing this trend of ever increasing memory management complexity.

'order files' which say what should be loaded in what order to optimize the layout and avoid page-faulting later

Static linking makes sure the compiler and linker has full control over code ordering. Dynamic linking gives random ordering, placing each DLL wherever the heap manager can find a vacant spot.

Cross module optimizations

The compiler can optimize across modules by inlining and by using the unique cross-module register allocation feature of ForwardCom.

The champion of static linking is likely rust, there compilation times are truly absurdly slow

I am not very familiar with rust, but it seems there are other reasons why the rust compiler is slow. Static linking in C++ is quite fast in my experience.

A main argument for dynamic linking is to avoid multiple copies of the same function in memory. I haven't seen any statistics on how many functions are actually shared by multiple running processes in a real-life situation, but I suspect the number is low. Especially in Linux where .so functions are shared only if they have the same version number.

James · Post by **James** » 2024-05-05, 11:24:27

It's frustratingly difficult to find any serious research comparing the two.

In the meantime, you're still not acknowledging that static linking huge graphical libraries is unacceptable, this is an obvious case where the functions and transitive dependencies are large and massively shared across many running apps. You're also welcome to guess how many printfs are simultaneously loaded in a system. And again, again I have to mention you said you want to support dynamic linking + dlopen anyway, so programs will be free to choose. "I think it is time to experiment"; all the more reason not to force decisions on people one way or another.

Dynamic linking solves a different problem: modular system updates, the ability to phase out older protocols and (albeit rarely used feature, to interpose different functions). It appears that in most cases the performance (but not code size) differences are negligible, in which case I would much rather have smaller binaries and faster compile times (just because you find it acceptably fast doesn't mean it's fine), basically whenever a function may be shared by more than one program and isn't worth inlining inside a hot loop, it should be shared.

"static linking .. assuring optimal caching"
No, optimal caching is when the function is already in cache, as it may be if being shared by other programs or preloaded, and crucially, able to avoid eviction.
"Dynamic linking gives random ordering" Wrong, thats a quirk of current systems, not to mention irrelevant for the main candidates for dynamic linking; big, shared functions with transitive dependencies.
Programs are likely to degenerate into fragmented access despite best efforts, which anyway only matters when it triggers extra cache misses (so tight hot loops are usually fine no matter what).

"I haven't seen any statistics on how many functions are actually shared by multiple running processes in a real-life situation". The swift developers can empirically attest that this can be huge. Similarly "many unused functions" may be irrelevant if they don't overlap cache lines, but still my suggestions about slicing libraries may often remove even that small imperfection. As for the page faults, I agree nobody likes them but they're not a feature of dynamic linking.

Post by **agner** » 2024-05-06, 5:35:10

James wrote:

It's frustratingly difficult to find any serious research comparing the two

Yes indeed.

you're still not acknowledging that static linking huge graphical libraries is unacceptable

Graphics libraries are huge because they contain many functions. Each program probably uses only only a small fraction of these. Static linking will load only what is actually used.

Programs written in different languages or using different graphics frameworks are not using the same libraries. You may have ten GUI programs loaded simultaneously using ten different graphics frameworks or different versions of the same framework. Sharing libraries saves nothing here. Multiple instances of the same program will share the code anyway. Fundamental graphics operations must be system functions anyway because they are accessing hardware ports.

Hopefully, the graphics framework in ForwardCom can be standardized so that all applications use the same framework, and much of this can be system functions.

how many printfs are simultaneously loaded

printf is used in C and C++. Other languages may have their own print functions.

printf is not very big. The current version of printf in ForwardCom is only a few kB, covering all features except floating point. Anyway, printf is a system function because it is accessing hardware devices. sprintf might be a library function, though.

"I think it is time to experiment"; all the more reason not to force decisions on people one way or another.

ForwardCom is an experimental system. The design makes it possible to do the experiments you are calling for.

Dynamic linking solves a different problem: modular system updates

That's what the relinking mechanism in ForwardCom is for. And it avoids most compatibility problems.

"static linking .. assuring optimal caching"
No, optimal caching is when the function is already in cache, as it may be if being shared by other programs or preloaded, and crucially, able to avoid eviction

Code that is randomly distributed all over the memory will not be optimally cached because some sets will be overused and other sets are underused. Low sets are likely overused because dynamic libraries are loaded at the beginning of memory pages with round addresses (unless some complicated randomization method is used, using extra hardware resources). Static linking makes all the code contiguous. This guarantees equal use of all cache sets and no partially used cache lines.

As for the page faults, I agree nobody likes them but they're not a feature of dynamic linking.

The proposed ForwardCom memory system is minimizing memory fragmentation and hence page faults.

I don't think the argument can be settled until someone makes the necessary experiments. ForwardCom is intended to inspire experiments and innovative university projects.

James · Post by **James** » 2024-05-06, 12:21:34

Once again.. once again you are ignoring the most important points and I have to repeat myself:

Most important of all, you said "WILL SUPPORT" dllopen, ergo dynamic linking will be an option, it will be used and must be designed and optimized.

Fragmentation is inevitable.
Flat sequences of tail called (ie. multi-entry) functions are rare, and are opportunistically laid out IRRESPECTIVE of the linking strategy. Call graphs are trees, they will be heavily fragmented and the best odds are obtained by minimizing the total amount of code in the system. CACHE EVICTIONS are key.

Transitive dependencies
You're massively underestimating the sheer amount of duplication that inlining everything entails. Do you argue the same for inlining all functions in C?

Dynamic link overhead is negligible: even as measured on linux with PLTs and out of line syscalls.

"Fundamental graphics operations must be system functions anyway"
obviously incorrect, asides from the most basic of primitives. And again, if you are providing a solid standard system library for this, it is still a LIBRARY, with a protocol that needs to be able to evolve, so we need to discuss dynamic linking.

"relinking mechanism"
static linking is already possible. Nobody distributes object files. They slow down compilation times and that goes beyond developer experience: ideally package managers shouldn't have massive binary caches but locally compile and tune programs. That's particularly obvious in non-forward com systems where eg. if you have avx512 but your binary cache doesn't then you're out of luck.

"static linking will load only what's actually used"
So does dynamic linking. if you don't like the pagefault overhead, improve the loader.
If you don't like the address at which dynamic libraries are loaded or the fact they're fully loaded or too large, change that, that's literally a non-argument.

"Sharing libraries saves nothing when there are many graphic frameworks"
'saves nothing' highly disingenuous.
Granted this is more of a linuxism, where problems are frequently solved mutliple times in subtly different non-standard ways. I often (In the context of specific languages) argue for a solid repository of packages that are easy to search and evaluate and both motivate and allow experts to focus on specific problems, hoogle is a solid example of this.

The necessary experiments:
Swift has the best dynamic linking environment at the moment. By your own admission, static linking is easy, you think they didn't try it? If they thought it was faster, more power efficient or generally preferable, they wouldn't have invested so much into ABI stability, to the point where polymorphism and extensible structures are supported cleanly. Rust's "championing" of static linking masks a complete failure of ABI stability. It's easier and "trust us, it will be fast" is quite the way to rationalize oneself out of doing the actual work.

Again "WILL SUPPORT". I'm not even arguing in favor of one or the other strategy, both are necessary and you've said as much.

Post by **agner** » 2024-05-08, 5:15:22

What can I say? You are repeating your argumenst. Do you want me to repeat my arguments too? We can never end the discussion this way.

You believe that the amount of DLL code that is actually shared between multiple running applications in real-life situations is high. I believe it is low. Nobody knows for sure because apparently nobody has studied this.

Memory access is a serious bottleneck for many applications today because CPU performance has grown exponentially for many years, while memory performance has grown mush less.

Memory performance can be improved by minimizing memory fragmentation. Memory has become so seriously fragmented in modern systems that it requires a large TLB and page tables in up to 5 levels. This complicated system is costly in terms of both silicon space, OS complexity, power consumption, and performance. The ForwardCom design experiments with simplifying this. I believe that the need for TLB and large multilevel page tables can be reduced drastically, or even eliminated completely in some cases, by minimizing memory fragmentation.

One major source of memory fragmentation is dynamic linking. This can be eliminated almost completely by the linking and relinking mechanism in ForwardCom. Dynamic linking should be used only in the rare cases where the library to link depends on user input after an application has started.

Nobody knows for sure how well this will work, but the possible gain in performance is so high that it is worth the experiment.

Kulasko · Post by **Kulasko** » 2024-05-21, 23:30:35

It seems to me like the discussion in this thread derailed a bit. That said, I want to offer my opinion on the static vs dynamic linking debate before returning to topic.

I did not find much about the benefits of dynamic linking. I did find the GNU people liking the lower amount of redundancy and total binary size of the system, and I did find this article about dynamic linking in Swift: https://faultlore.com/blah/swift-abi/

The case of dynamic linking in Swift

After reading it, my understanding is that Apple used dynamic linking to keep parts of the standard library in cache in order to improve efficiency. Following the article, they wanted code sharing so much that they even willingly sacrificed cross-module access efficiency to enable it. This mainly works because Apple controls the ecosystem and can enforce almost all running software using the exact same dynamic library, so it seems a bit of a special case.

I would imagine the benefits to primarily touch background tasks, since they should be relatively light on logic and interact with the OS a lot. Else, any active process with a large enough working set would eventually evict the standard library, or at least big parts of it, if it doesn't touch it in the inner loop.
In total, I wouldn't see many opportunities like that in an open, heterogenous system. You would need to link to the exact same version of a library, and that library would have to be small (so it fits in cache next to the working set) and frequently used (so no working set evicts it).

The case of dynamic linking in ForwardCom

From an implementation perspective, ForwardCom doesn't lend itself to dynamic linking all that much. In particular, ForwardCom is much more sensitive to memory fragmentation at a macro level, which does not mean related functions being put next to each other, but putting code in separate memory segments with different address translations and/or permissions.
Compared to the nearly unlimited number of mappings in deeply nested paging, the proposed memory map in ForwardCom has quite a limited capacity. I am currently thinking of three mapping strategies:

Mapping a combination of libraries into a single segment, efficient mapping, limited code sharing, extremely wasteful for RAM
Mapping each library into its own segment, medium mapping efficiency, full code sharing, still quite wasteful for RAM
Mapping each library into multiple segments, terrible mapping efficiency, full code sharing, easy to save on RAM

Each strategy has distinct problems, but dynamic linking in general requires some inefficiencies, limiting optimization potential across libraries and requiring either to patch binary code with long jumps or working with pointers for linking.

On the other hand, one thing that ForwardCom hopes to achieve are a lot more efficient system calls. Putting common functionality in a "software driver" and accessing it via system calls instead of using regular dynamic libraries might be another interesting research opportunity.

Linker symbol lookup

My main question would be how high the payoff is, with the proposal adding considerable complexity to the object file format. What I do like about hashing is not needing to touch a bunch of random memory locations for searching. Do we know of any evidence for a linker (static or dynamic) being severely bottlenecked by this lookup? In the case of static linkers, do modern ones such as mold produce (much) worse code than the big, slow ones? I would accept being half as fast as 'cp' (https://github.com/rui314/mold) as quite satisfactory.

Post by **agner** » 2024-05-22, 6:04:58

Thanks Kulasko.

Runtime linking in ForwardCom does not load the whole library. It loads only the functions needed in the same way as static linking. This avoids the wasteful RAM use of traditional dynamic linking. The executable program is reserving memory space at load time for the expected need of runtime linking. This makes it possible to put the library code contiguous with the main program code to optimize caching. If the memory requirement of the libraries turns out to be higher than expected, then the OS needs to put it somewhere else. The OS has freedom to manage this in any way it finds most efficient.

I don't think symbol lookup in the linker will ever be a bottleneck. It will likely be much faster than loading the library from disk.

A hashing system would consume more RAM because it needs a big hash map even for a small library.

forwardcom forum

Proposal for symbol lookup in the linker

Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker

Re: Proposal for symbol lookup in the linker