Search found 80 matches

by HubertLamontagne
2024-02-17, 16:27:49
Forum: forwardcom forum
Topic: Fushed push with bounds check
Replies: 5
Views: 11206

Re: Fushed push with bounds check

Yeah, this is std::vector::push_back(), right? This is at least 5 micro-ops even in the very best case: - Load [array size, current allocation size, pointer to data] - (second micro-op from 24byte unaligned vector load) - Increment array size and check that it's lower or equal to the allocated size,...
by HubertLamontagne
2023-01-29, 15:54:35
Forum: forwardcom forum
Topic: Proposals for next version
Replies: 16
Views: 111912

Re: Proposals for next version

I've played around with how to reduce the stalls from not knowing if instructions are going to run or not... and I've come up with "on the fly conditionalization"... for instance, the code: int32 r0 = [r1] int32 r2 = r3 + r4 You'd want to run r2=r3+r4 to run early, but you can't because th...
by HubertLamontagne
2022-12-02, 18:56:22
Forum: forwardcom forum
Topic: Proposals for next version
Replies: 16
Views: 111912

Re: Proposals for next version

The compiler can't always reorder instructions in order of priority... for instance: int32 r0 = [r1] // high priority int64 r4 += r5 // high priority int32 r0 += 9 int32 r0 *= 7 int32 r0 >>= 2 int32 r0 += 1 int32 r0 |= 32 int32 r0 >>= 1 int32 r0 += 1 int32 [r1] = r0 // high priority int32 r2 = [r3+r...
by HubertLamontagne
2022-11-30, 18:12:20
Forum: forwardcom forum
Topic: Proposals for next version
Replies: 16
Views: 111912

Re: Proposals for next version

High/Low Priority Hint One idea that I don't know if it makes any sense would be to add a hint to instructions to tell if they should run with high priority or low priority. High priority would be used for instructions whose result would influence conditional jumps and memory addressing, and low pr...
by HubertLamontagne
2022-11-30, 17:55:45
Forum: forwardcom forum
Topic: Integer division by zero
Replies: 4
Views: 31577

Re: Integer division by zero

Clearly it's best if a % b should be equal to a - (a/b)*b in all cases including error ones yes (/0, and -0x80000000/-1 in signed 32 bits), that way the CPU doesn't need a modulo instruction and can replace it with a - d*b in all cases yeah. As for n/0 returning 0 or int min/max, there's something i...
by HubertLamontagne
2022-10-21, 14:31:00
Forum: forwardcom forum
Topic: Proposals for next version
Replies: 16
Views: 111912

Re: Proposals for next version

1. Speculative memory read: I'm not totally sure that this is a net win. I think there arguments for and against: - Speculative memory reads can be partly emulated for vectorized code on paged memory architectures. As long as the first byte read in a memory page happens for real, then we know that a...
by HubertLamontagne
2022-06-21, 22:16:10
Forum: forwardcom forum
Topic: Casei Muratori stream about Vector lane masking
Replies: 0
Views: 30099

Casei Muratori stream about Vector lane masking

Casey Muratori recently had a stream about what seems to be a design mistake in the very new Risc-V SIMD proposal: https://twitter.com/cmuratori/status/1538622391307251713 Basically, it can only use v0 (the first vector register) for masking, and it only uses the first bits of v0 as mask where each ...
by HubertLamontagne
2022-05-19, 21:26:33
Forum: forwardcom forum
Topic: Bit addressing
Replies: 8
Views: 43128

Re: Bit addressing

There's presumably something nice to be done with succint trees, as per the paper you linked. I like how they avoid having strings of pointers (which are a massive problem, speed-wise). But this doesn't imply that you should do bit addressing: instead, it's probably optimal to store the tree-structu...
by HubertLamontagne
2022-05-16, 14:06:23
Forum: forwardcom forum
Topic: Bit addressing
Replies: 8
Views: 43128

Re: Bit addressing

Some STM32 microcontrollers have bit-banded memory aliased regions for that purpose: https://micromouseonline.com/2010/07/14/bit-banding-in-the-stm32/ For instance, the memory byte 0x20000000 also appears as eight single bit addresses at 0x22000000 0x22000001 0x22000002 0x22000003 0x22000004 0x22000...
by HubertLamontagne
2022-04-12, 20:24:13
Forum: forwardcom forum
Topic: Memory safety enforcement using CHERI
Replies: 3
Views: 25797

Re: Memory safety enforcement using CHERI

I gotta admit, I haven't seen other stuff like that yet. Interesting goal, tracking the allowed range of all memory addresses and if they're heap/stack/global data, and even object encapsulation. Not quite sure what to think of it, it reminds me of 16-bit x86's protected mode FAR pointers (and the h...
by HubertLamontagne
2022-01-24, 19:18:11
Forum: forwardcom forum
Topic: Nonlocal control flow
Replies: 10
Views: 42137

Re: Nonlocal control flow

I imagine these usages could be done with an OS call but wouldn't require a full interrupt. So you'd have an intermediary performance level (privilege level change, but no full pipeline-flush-and-context-switch from an interrupt). Which I guess still makes sense since longjmp and triggering exceptio...
by HubertLamontagne
2021-12-19, 23:05:43
Forum: forwardcom forum
Topic: How to avoid memory fragmentation
Replies: 5
Views: 22912

Re: How to avoid memory fragmentation

Looking at the memory layout of various programs using microsoft's VMMap tool ( https://docs.microsoft.com/en-us/sysinternals/downloads/vmmap ), Windows does manage to keep some fairly large blocks of memory contiguous, often 10+mb large. Very few memory blocks are just a single 4k page, typical blo...
by HubertLamontagne
2021-07-04, 16:11:16
Forum: forwardcom forum
Topic: Timer, DMA controller, Interrupt controller built into the architecture?
Replies: 0
Views: 29457

Timer, DMA controller, Interrupt controller built into the architecture?

One important part of how the PC became common was the standardized timers, dma controllers and interrupt controllers based on the chips IBM used in early PCs: - Intel 8237 DMA controller (x2 in PC AT) - Intel 8259 IRQ controller (x2 in PC AT) - Intel 8253/8254 timer On modern PCs these were supplem...
by HubertLamontagne
2021-06-28, 20:56:55
Forum: forwardcom forum
Topic: Universal boolean instruction
Replies: 5
Views: 21573

Re: Universal boolean instruction

I imagine that Intel's implementation is probably just a whole bunch of 8:1 multiplexers (and high fanout buffers), and I think that part of the idea is that the compiler could fold any number of operations on the same 3 inputs by changing the 8 entry lookup table.
by HubertLamontagne
2021-06-28, 2:11:14
Forum: forwardcom forum
Topic: input/output instructions
Replies: 8
Views: 30071

Re: input/output instructions

Cuminies wrote: 2021-06-26, 19:55:26 multiple versions of forwardcom cpu that have compatible base instruction sets and different extended instruction sets to satisfy hubert an agner both?
Forwardcom is 100% agner's baby, I wouldn't dare attempting to fork it :P
by HubertLamontagne
2021-06-24, 16:06:03
Forum: forwardcom forum
Topic: Macro-op fusion as an intentional instruction set design choice
Replies: 4
Views: 20209

Re: Macro-op fusion as an intentional instruction set design choice

One extra question here... Not to be too inquisitive here, but I was wondering what is your exact motivation for doing LOAD+MATH in a single instruction instead of a LOAD and MATH instruction sequence on Forwardcom. Is it: - Is the goal to build in-order CPU pipelines like the 1st generation Intel A...
by HubertLamontagne
2021-06-24, 15:10:00
Forum: forwardcom forum
Topic: Default integer size 32 or 64 bits?
Replies: 7
Views: 24640

Re: Default integer size 32 or 64 bits?

Can you make it a flag in the cpu? Like a mode? 32/64 bit mode. When reading some instructions it can convert them before anything else happens. I don't know anything about CPUs. Normally, 32/64/32-in-64 bits for memory addresses/pointers is a CPU flag yes, as it affects a ton of other stuff (page ...
by HubertLamontagne
2021-06-16, 17:26:06
Forum: forwardcom forum
Topic: Load From Const Array Instruction
Replies: 3
Views: 9441

Re: Load From Const Array Instruction

It's an interesting idea, for sure. Though I think it would require a 3rd cache - I don't think it can share the instruction cache: - The instruction cache doesn't do word addressing. It loads whole 128bit or 256bit aligned chunks, that are then queued into the so-called "prefetch queue" (...
by HubertLamontagne
2021-04-13, 20:48:42
Forum: forwardcom forum
Topic: Macro-op fusion as an intentional instruction set design choice
Replies: 4
Views: 20209

Re: Macro-op fusion as an intentional instruction set design choice

I imagine that ARM decided on specifically making 4 operand operations into a fusable two-instruction sequence in order to avoid introducing 64bit opcodes in ARM64 when everything else is 32bit only, and to have an escape chute in case nobody used SVE (or if they have so many register file write por...
by HubertLamontagne
2021-04-12, 18:27:55
Forum: forwardcom forum
Topic: Macro-op fusion as an intentional instruction set design choice
Replies: 4
Views: 20209

Macro-op fusion as an intentional instruction set design choice

Arm v9 (basically adding Scalable Vector Extensions to the main instruction set) has an interesting new instruction called MOVPRFX: https://developer.arm.com/documentation/ddi0602/latest/SVE-Instructions/MOVPRFX--unpredicated---Move-prefix--unpredicated--?lang=en The problem that they had is that wi...
by HubertLamontagne
2021-03-22, 14:53:36
Forum: forwardcom forum
Topic: Default integer size 32 or 64 bits?
Replies: 7
Views: 24640

Re: Default integer size 32 or 64 bits?

Arm64 has a fairly extensive solution to this, so it might make sense to look it up. One solution is to use 64 bit instructions to do 32 bit math and let the top 32 bits be junk, except for specific cases (generally operations that propagate bits rightwards): - Right shift and arithmetic right shift...
by HubertLamontagne
2021-03-21, 15:46:23
Forum: forwardcom forum
Topic: Rollbackable L1 Data Cache Design?
Replies: 7
Views: 16857

Re: Rollbackable L1 Data Cache Design?

Forcing store operations to never be speculative simplifies a lot of these tasks (it kinda turns the cpu into a partially in-order CPU?) but I'm not sure I can come up with a kind of architecture that could do this without causing huge stalls... Assuming no other core is dependent on the data of a ...
by HubertLamontagne
2021-03-19, 15:14:07
Forum: forwardcom forum
Topic: Rollbackable L1 Data Cache Design?
Replies: 7
Views: 16857

Re: Rollbackable L1 Data Cache Design?

In most cases, yes. But not in all cases... You can always pass unaligned pointers to other functions... Though I guess that kind of case can always be handled with a Memory Aliasing Fault, and letting the OS do the split memory access for you and cobbling together the result... Slow, but should hap...
by HubertLamontagne
2021-03-14, 22:03:09
Forum: forwardcom forum
Topic: Rollbackable L1 Data Cache Design?
Replies: 7
Views: 16857

Re: Rollbackable L1 Data Cache Design?

Perhaps the way unaligned loads/stores could be handled is through an alignment predictor... All memory loads/stores are initially predicted to be aligned, and if a memory operation ends up being unaligned, it triggers a branch prediction fail and the load/store operation is recorded in the alignmen...
by HubertLamontagne
2021-03-12, 16:52:05
Forum: forwardcom forum
Topic: Rollbackable L1 Data Cache Design?
Replies: 7
Views: 16857

Rollbackable L1 Data Cache Design?

Looking at the problem of how to build fast CPU cores, I've come to the conclusion that the key component that differentiates the big boys is to have an L1 data cache that supports rollbacking operations that haven't graduated/committed. The simpler CPUs that have this kind of L1 Cache (such as the ...