### **COMPSAC 2106 Plenary Panel Rebooting Computing: Future of Architecture and Software**

## **Multicore Software and Architecture**

### Hironori Kasahara

Professor, Dept. of Computer Science & Engineering Director, Advanced Multicore Processor Research Institute Waseda University, Tokyo, Japan IEEE Computer Society Multicore STC Chair URL: http://www.kasahara.cs.waseda.ac.jp/

Waseda Univ. GCSC

# Performance and Low Power are Key Issues

Power consumption is one of the biggest problems for performance scaling from smartphones to cloud servers and supercomputers ("K" more than 10MW).



IEEE ISSCC08: Paper No. 4.5, M.ITO, ... and H. Kasahara, "An 8640 MIPS SoC with Independent Power-off Control of 8 CPUs and 8 RAMs by an Automatic Parallelizing Compiler" Power  $\propto$  Frequency \* Voltage<sup>2</sup> (Voltage  $\propto$  Frequency)

▶ Power ∝ Frequency<sup>3</sup>

If <u>Frequency</u> is reduced to <u>1/4</u> (Ex. 4GHz→1GHz), Power is reduced to 1/64 and Performance falls down to <u>1/4</u>. <<u>Multicores</u>> If <u>8cores</u> are integrated on a chip, Power is still <u>1/8</u> and

**<u>Performance</u>** becomes <u>2 times</u>.



With 128 cores, OSCAR compiler gave us 100 times speedup against 1 core execution and 211 times speedup against 1 core using Sun (Oracle) Studio compiler.

### **OSCAR Parallelizing Compiler**

#### To improve effective performance, cost-performance and software productivity and reduce power

#### **Multigrain Parallelization**

coarse-grain parallelism among loops and subroutines, near fine grain parallelism among statements in addition to loop parallelism

#### **Data Localization**

Automatic data management for distributed shared memory, cache and local memory

#### **Data Transfer Overlapping**

Data transfer overlapping using Data Transfer Controllers (DMAs)

#### **Power Reduction**

Reduction of consumed power by compiler control DVFS and Power gating with hardware supports.





### **Power of Multicores with DVFS can be Reduced by Software: Intel Haswell**





**1 core Power (29.3W)** was reduced to **1/3 (9.6W) with 3 cores by OSCAR compiler.** 

### Architecture Design to Support for Parallelization and Power Reduction by Compiler

**Vector Multicore for Embedded to Severs** 



Target:

Solar Powered with

compiler power reduction.

Fully automatic

parallelization and

vectorization including

local memory management and data transfer.

## 1992 Fujitsu VPP500/NWT: PE Unit

