## **Evolution of Compiler and Multiprocessors with Accelerators**



## Prof. Hironori Kasahara, IEEE Life-Fellow, IPSJ Fellow Senior Executive Vice President (2018-2022), Waseda University



1

IEEE Computer Society President 2018 Board Member: The Academy of Engineering of Japan

URL: http://www.kasahara.cs.waseda.ac.jp/

1980 BS, 82 MS, 85 Ph.D., Dept. EE, Waseda Univ. 1985 Visiting Scholar: U. of California, Berkeley, 1986 Assistant Prof., 1988 Associate Prof., 1989-90 Research Scholar: U. of Illinois, Urbana-Champaign, Center for Supercomputing R&D, 1997 Prof. 2004 Director, Advanced Multicore Research Institute, 2017member: the Engineering Academy of Japan (2020-Board Mem) and the Science Council of Japan **2018 IEEE Computer Society President** Senior Vice President, Waseda Univ. (2018 Nov.-2022 Sept.) **AWARD: 1987 IFAC World Congress Young Author Prize** 1997 IPSJ Sakai Special Research Award, 2005 STARC Academia-Industry Research Award, 2008 LSI of the Year Second Prize, 2008 Intel Asia Academic Forum Best Research Award, 2010 IEEE CS Golden Core Member Award 2014 Minister of Edu., Sci. & Tech. Research Prize 2015 IPSJ Fellow, 2017 IEEE Fellow, Eta Kappa Nu 2019 Spirit of IEEE Computer Society Award, **2020 IPSJ Contribution Award**,

Reviewed Papers: 232, Invited Talks: 230, Granted Patents: 70 (Japan, US, GB, DE, China), Articles in News Papers, Web News, TV etc.: 697

**Committees in Societies and Government 287** IEEE Computer Society: President 2018, Executive Committee(2017-2019), BoG(2009-14), Strategic Planning Committee Chair 2018, Multicore STC Chair (2012-), Japan Chair(2005-07), **IPSJ** Chair: HG for Magazine. & J. Edit, Sig. on ARC. [METI/NEDO] Project Leaders: Multicore for **Consumer Electronics, Advanced Parallelizing Compiler**, Chair: Computer Strategy Committee **[Cabinet Office]** CSTP Supercomputer Strategic ICT PT, Japan Prize Selection Committees, etc. [MEXT] Info. Sci. & Tech. Committee, Supercomputers (Earth Simulator, HPCI Promo., Next Gen. Supercomputer K) Committees JST Moonshot Project G3 Robot & AI Vice Chair, [COCN] Board Member in Council of Competitiveness Nippon, etc.

The 36<sup>th</sup> LCPC2023, Panel: Evolution of Parallel Architecture Targets, Oct. 12, 2023, Univ. of Kentucky

# **Heterogeneous Multicore Architecture** targeted by OSCAR API



# **OSCAR Green Vector Multicore and Compiler for AI Robots, Automobile, and Smartphone to** Data Center and Supercomputer Target:



- **Solar Powered**
- **Compiler power reduction.**

>Fully automatic parallelization and vectorization including local memory management and data transfer.

#### **Vector Accelerator**

#### Features

- Attachable for any CPUs (Intel, ARM, IBM)
- Data driven initiation by sync flags



#### **Function Units [tentative]**

- **Vector Function Unit** 
  - 8 double precision ops/clock
  - 64 characters ops/clock
  - Variable vector register length
  - Chaining LD/ST & Vector pipes
- **Scalar Function Unit**

#### Registers[tentative]

- Vector Register 256Bytes/entry, 32entry
- Scalar Register 8Bytes/entry
- Floating Point Register 8Bytes/entry
- Mask Register 32Bytes/entry

# **Software Cache Coherence Control**

- Software Coherence for Specific Purpose Multicore Chips:
  - Hardware coherence control is getting expensive.
  - Directory Based CC-Numa is very effective, however overhead is large. Software Coherence is effective for specific purpose machines.
    - OSCAR Compiler experiences showed software coherence control gave us efficient execution for some applications than hardware coherence
    - We can use software coherence for specific applications by power gating hardware coherence controller for low power and faster execution.



**Automatic Local Memory Management by Compiler** 

**Data Localization: Loop Aligned Decomposition** 

> Hard Realtime Applications: Automobile, AI-Robot, etc.

**Re-producibility** is also required. So, automobile companies need software task scheduling and efficient use of a limited size **Block Replacement Policy** of local memory.

## **Single dimension Decomposition**



# by OSCAR Compiler

- **Compiler Control Memory block** Replacement
  - using live, dead and reuse information of each variable from the scheduled result
  - different from LRU in cache that does not use data dependence information

### Block Eviction Priority Policy

- 1. (Dead) Variables that will not be accessed later in the program
- 2. Variables that are accessed only by other processor cores
- 3. Variables that will be later accessed by the current processor core
- 4. Variables that will immediately be accessed by the current processor core 5

**Back to the Future Control** 

