1 Load Slice Core (LSC) was the first work to propose an sOoO core Freeway builds upon the LSC proposal and exposes more MHP than LSC by adding one more in-order queue for uncovering additional independent loads. The sOoO cores are restricted out-of-order machines that add modest hardware overhead upon a stall-on-use in-order core to improve instruction-level parallelism (ILP) as well as memory-hierarchy parallelism (MHP).
#Slice it forward plus#
Recently, slice-out-of-order (sOoO) core microarchitectures have been proposed to address the in-order issue bottleneck by allowing the execution of load and store instructions, plus their backward slices (i.e., the address-generating sequence of instructions leading up to these memory operations), to bypass arithmetic instructions in the dynamic instruction stream. Mobile and edge devices need increasingly high performance at low cost and low power consumption.Īlthough in-order cores are highly energy-efficient, their in-program order execution model severely restricts performance compared to OoO cores. In particular, the number of smartphone users is continuously increasing reaching close to 4 billion users around the world this year further, projections estimate 50 billion Internet-of-Things (IoT) devices by 2030 finally, the 5G market is expected to involve 666 million devices.
![slice it forward slice it forward](https://i.ytimg.com/vi/Wp860uwXyQE/maxresdefault.jpg)
This is of particular importance in the huge and continuously growing mobile and embedded markets. An ideal processor design, however, should deliver high performance at a small chip area and power overhead. InO cores, on the other hand, consume significantly less power as a consequence of their much simpler design and smaller chip area. To deliver high performance, OoO cores are power-hungry due to their high design complexity and large chip area. The two ends of the spectrum are represented by superscalar out-of-order (OoO) and in-order (InO) cores, respectively. Modern processors are designed to either deliver high performance or provide high energy efficiency. Compared to an OoO core, FSC reduces power consumption by 61.3% and chip area by 47%, providing a microarchitecture with high performance at low complexity. Considering 3-wide pipeline configurations, we find that FSC improves performance by 27.1%, 21.1%, and 14.6% on average compared to Freeway, the state-of-the-art sOoO core, across SPEC CPU2017, GAP, and DaCapo, respectively, while at the same time incurring reduced hardware complexity. Finally, FSC eliminates the need for dynamic memory disambiguation by replicating store-address instructions across queues. Moreover, FSC puts load-consumers that depend on L1 D-cache misses on the side to enable younger independent load-consumers to execute faster. FSC does so by identifying and steering forward slices (rather than backward slices) to dedicated in-order FIFO queues. In this article, we propose Forward Slice Core (FSC), a novel core microarchitecture that builds on a stall-on-use in-order core and extracts more instruction-level and memory-hierarchy parallelism than slice-out-of-order cores. Unfortunately, constructing backward slices is imprecise and hardware-inefficient, leaving performance on the table. Recently proposed slice-out-of-order (sOoO) cores identify backward slices of memory operations which they execute out-of-order with respect to the rest of the dynamic instruction stream for increased instruction-level and memory-hierarchy parallelism.
![slice it forward slice it forward](https://res.cloudinary.com/practicaldev/image/fetch/s--HxPeJu2w--/c_imagga_scale,f_auto,fl_progressive,h_420,q_auto,w_1000/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6fdihicohd4x02lyh1f6.png)
A processor architecture should ideally provide high performance in a power- and cost-efficient manner. In-order cores, in contrast, are less complex and have a smaller power budget, but offer low performance. Superscalar out-of-order cores deliver high performance at the cost of increased complexity and power budget.