Search found 80 matches
- 2024-02-17, 16:27:49
- Forum: forwardcom forum
- Topic: Fushed push with bounds check
- Replies: 5
- Views: 11206
Re: Fushed push with bounds check
Yeah, this is std::vector::push_back(), right? This is at least 5 micro-ops even in the very best case: - Load [array size, current allocation size, pointer to data] - (second micro-op from 24byte unaligned vector load) - Increment array size and check that it's lower or equal to the allocated size,...
- 2023-01-29, 15:54:35
- Forum: forwardcom forum
- Topic: Proposals for next version
- Replies: 16
- Views: 111912
Re: Proposals for next version
I've played around with how to reduce the stalls from not knowing if instructions are going to run or not... and I've come up with "on the fly conditionalization"... for instance, the code: int32 r0 = [r1] int32 r2 = r3 + r4 You'd want to run r2=r3+r4 to run early, but you can't because th...
- 2022-12-02, 18:56:22
- Forum: forwardcom forum
- Topic: Proposals for next version
- Replies: 16
- Views: 111912
Re: Proposals for next version
The compiler can't always reorder instructions in order of priority... for instance: int32 r0 = [r1] // high priority int64 r4 += r5 // high priority int32 r0 += 9 int32 r0 *= 7 int32 r0 >>= 2 int32 r0 += 1 int32 r0 |= 32 int32 r0 >>= 1 int32 r0 += 1 int32 [r1] = r0 // high priority int32 r2 = [r3+r...
- 2022-11-30, 18:12:20
- Forum: forwardcom forum
- Topic: Proposals for next version
- Replies: 16
- Views: 111912
Re: Proposals for next version
High/Low Priority Hint One idea that I don't know if it makes any sense would be to add a hint to instructions to tell if they should run with high priority or low priority. High priority would be used for instructions whose result would influence conditional jumps and memory addressing, and low pr...
- 2022-11-30, 17:55:45
- Forum: forwardcom forum
- Topic: Integer division by zero
- Replies: 4
- Views: 31577
Re: Integer division by zero
Clearly it's best if a % b should be equal to a - (a/b)*b in all cases including error ones yes (/0, and -0x80000000/-1 in signed 32 bits), that way the CPU doesn't need a modulo instruction and can replace it with a - d*b in all cases yeah. As for n/0 returning 0 or int min/max, there's something i...
- 2022-10-21, 14:31:00
- Forum: forwardcom forum
- Topic: Proposals for next version
- Replies: 16
- Views: 111912
Re: Proposals for next version
1. Speculative memory read: I'm not totally sure that this is a net win. I think there arguments for and against: - Speculative memory reads can be partly emulated for vectorized code on paged memory architectures. As long as the first byte read in a memory page happens for real, then we know that a...
- 2022-06-21, 22:16:10
- Forum: forwardcom forum
- Topic: Casei Muratori stream about Vector lane masking
- Replies: 0
- Views: 30099
Casei Muratori stream about Vector lane masking
Casey Muratori recently had a stream about what seems to be a design mistake in the very new Risc-V SIMD proposal: https://twitter.com/cmuratori/status/1538622391307251713 Basically, it can only use v0 (the first vector register) for masking, and it only uses the first bits of v0 as mask where each ...
- 2022-05-19, 21:26:33
- Forum: forwardcom forum
- Topic: Bit addressing
- Replies: 8
- Views: 43128
Re: Bit addressing
There's presumably something nice to be done with succint trees, as per the paper you linked. I like how they avoid having strings of pointers (which are a massive problem, speed-wise). But this doesn't imply that you should do bit addressing: instead, it's probably optimal to store the tree-structu...
- 2022-05-16, 14:06:23
- Forum: forwardcom forum
- Topic: Bit addressing
- Replies: 8
- Views: 43128
Re: Bit addressing
Some STM32 microcontrollers have bit-banded memory aliased regions for that purpose: https://micromouseonline.com/2010/07/14/bit-banding-in-the-stm32/ For instance, the memory byte 0x20000000 also appears as eight single bit addresses at 0x22000000 0x22000001 0x22000002 0x22000003 0x22000004 0x22000...
- 2022-04-12, 20:24:13
- Forum: forwardcom forum
- Topic: Memory safety enforcement using CHERI
- Replies: 3
- Views: 25797
Re: Memory safety enforcement using CHERI
I gotta admit, I haven't seen other stuff like that yet. Interesting goal, tracking the allowed range of all memory addresses and if they're heap/stack/global data, and even object encapsulation. Not quite sure what to think of it, it reminds me of 16-bit x86's protected mode FAR pointers (and the h...
- 2022-01-24, 19:18:11
- Forum: forwardcom forum
- Topic: Nonlocal control flow
- Replies: 10
- Views: 42137
Re: Nonlocal control flow
I imagine these usages could be done with an OS call but wouldn't require a full interrupt. So you'd have an intermediary performance level (privilege level change, but no full pipeline-flush-and-context-switch from an interrupt). Which I guess still makes sense since longjmp and triggering exceptio...
- 2021-12-19, 23:05:43
- Forum: forwardcom forum
- Topic: How to avoid memory fragmentation
- Replies: 5
- Views: 22912
Re: How to avoid memory fragmentation
Looking at the memory layout of various programs using microsoft's VMMap tool ( https://docs.microsoft.com/en-us/sysinternals/downloads/vmmap ), Windows does manage to keep some fairly large blocks of memory contiguous, often 10+mb large. Very few memory blocks are just a single 4k page, typical blo...
- 2021-07-04, 16:11:16
- Forum: forwardcom forum
- Topic: Timer, DMA controller, Interrupt controller built into the architecture?
- Replies: 0
- Views: 29457
Timer, DMA controller, Interrupt controller built into the architecture?
One important part of how the PC became common was the standardized timers, dma controllers and interrupt controllers based on the chips IBM used in early PCs: - Intel 8237 DMA controller (x2 in PC AT) - Intel 8259 IRQ controller (x2 in PC AT) - Intel 8253/8254 timer On modern PCs these were supplem...
- 2021-06-28, 20:56:55
- Forum: forwardcom forum
- Topic: Universal boolean instruction
- Replies: 5
- Views: 21573
Re: Universal boolean instruction
I imagine that Intel's implementation is probably just a whole bunch of 8:1 multiplexers (and high fanout buffers), and I think that part of the idea is that the compiler could fold any number of operations on the same 3 inputs by changing the 8 entry lookup table.
- 2021-06-28, 2:11:14
- Forum: forwardcom forum
- Topic: input/output instructions
- Replies: 8
- Views: 30071
- 2021-06-24, 16:06:03
- Forum: forwardcom forum
- Topic: Macro-op fusion as an intentional instruction set design choice
- Replies: 4
- Views: 20209
Re: Macro-op fusion as an intentional instruction set design choice
One extra question here... Not to be too inquisitive here, but I was wondering what is your exact motivation for doing LOAD+MATH in a single instruction instead of a LOAD and MATH instruction sequence on Forwardcom. Is it: - Is the goal to build in-order CPU pipelines like the 1st generation Intel A...
- 2021-06-24, 15:10:00
- Forum: forwardcom forum
- Topic: Default integer size 32 or 64 bits?
- Replies: 7
- Views: 24640
Re: Default integer size 32 or 64 bits?
Can you make it a flag in the cpu? Like a mode? 32/64 bit mode. When reading some instructions it can convert them before anything else happens. I don't know anything about CPUs. Normally, 32/64/32-in-64 bits for memory addresses/pointers is a CPU flag yes, as it affects a ton of other stuff (page ...
- 2021-06-16, 17:26:06
- Forum: forwardcom forum
- Topic: Load From Const Array Instruction
- Replies: 3
- Views: 9441
Re: Load From Const Array Instruction
It's an interesting idea, for sure. Though I think it would require a 3rd cache - I don't think it can share the instruction cache: - The instruction cache doesn't do word addressing. It loads whole 128bit or 256bit aligned chunks, that are then queued into the so-called "prefetch queue" (...
- 2021-04-13, 20:48:42
- Forum: forwardcom forum
- Topic: Macro-op fusion as an intentional instruction set design choice
- Replies: 4
- Views: 20209
Re: Macro-op fusion as an intentional instruction set design choice
I imagine that ARM decided on specifically making 4 operand operations into a fusable two-instruction sequence in order to avoid introducing 64bit opcodes in ARM64 when everything else is 32bit only, and to have an escape chute in case nobody used SVE (or if they have so many register file write por...
- 2021-04-12, 18:27:55
- Forum: forwardcom forum
- Topic: Macro-op fusion as an intentional instruction set design choice
- Replies: 4
- Views: 20209
Macro-op fusion as an intentional instruction set design choice
Arm v9 (basically adding Scalable Vector Extensions to the main instruction set) has an interesting new instruction called MOVPRFX: https://developer.arm.com/documentation/ddi0602/latest/SVE-Instructions/MOVPRFX--unpredicated---Move-prefix--unpredicated--?lang=en The problem that they had is that wi...
- 2021-03-22, 14:53:36
- Forum: forwardcom forum
- Topic: Default integer size 32 or 64 bits?
- Replies: 7
- Views: 24640
Re: Default integer size 32 or 64 bits?
Arm64 has a fairly extensive solution to this, so it might make sense to look it up. One solution is to use 64 bit instructions to do 32 bit math and let the top 32 bits be junk, except for specific cases (generally operations that propagate bits rightwards): - Right shift and arithmetic right shift...
- 2021-03-21, 15:46:23
- Forum: forwardcom forum
- Topic: Rollbackable L1 Data Cache Design?
- Replies: 7
- Views: 16857
Re: Rollbackable L1 Data Cache Design?
Forcing store operations to never be speculative simplifies a lot of these tasks (it kinda turns the cpu into a partially in-order CPU?) but I'm not sure I can come up with a kind of architecture that could do this without causing huge stalls... Assuming no other core is dependent on the data of a ...
- 2021-03-19, 15:14:07
- Forum: forwardcom forum
- Topic: Rollbackable L1 Data Cache Design?
- Replies: 7
- Views: 16857
Re: Rollbackable L1 Data Cache Design?
In most cases, yes. But not in all cases... You can always pass unaligned pointers to other functions... Though I guess that kind of case can always be handled with a Memory Aliasing Fault, and letting the OS do the split memory access for you and cobbling together the result... Slow, but should hap...
- 2021-03-14, 22:03:09
- Forum: forwardcom forum
- Topic: Rollbackable L1 Data Cache Design?
- Replies: 7
- Views: 16857
Re: Rollbackable L1 Data Cache Design?
Perhaps the way unaligned loads/stores could be handled is through an alignment predictor... All memory loads/stores are initially predicted to be aligned, and if a memory operation ends up being unaligned, it triggers a branch prediction fail and the load/store operation is recorded in the alignmen...
- 2021-03-12, 16:52:05
- Forum: forwardcom forum
- Topic: Rollbackable L1 Data Cache Design?
- Replies: 7
- Views: 16857
Rollbackable L1 Data Cache Design?
Looking at the problem of how to build fast CPU cores, I've come to the conclusion that the key component that differentiates the big boys is to have an L1 data cache that supports rollbacking operations that haven't graduated/committed. The simpler CPUs that have this kind of L1 Cache (such as the ...