# SEMICON JAPAN Future of High Performance & Low Power Multicore Technology Hironori Kasahara IEEE Computer Society President 2018 Waseda University





# Hironori Kasahara

IEEE Computer Society President Elect 2017, President 2018 Professor, Dept. of Computer Science & Engineering Director, Advanced Multicore Processor Research Institute Waseda University, Tokyo, Japan

URL: http://www.kasahara.cs.waseda.ac.jp/

| <ul> <li>1980 BS, 82 MS, 85 Ph.D., Dept. EE, Waseda Univ.</li> <li>1985 Visiting Scholar: U. of California, Berkeley</li> <li>1986 Assistant Prof., 1988 Associate Prof., 1997 Prof., Dept. of EECE,</li> <li>Waseda Univ. Now Dept. of Computer Sci. &amp; Eng.</li> </ul> | Reviewed Papers: 214, Invited Talks: 145, Published<br>Unexamined Patent Application:59 (Japan, US, GB, C<br>Granted Patents: 30), Articles in News Papers, Web N<br>Medias incl. TV etc.: 572 |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1989-90 Research Scholar: U. of Illinois, Urbana-Champaign, Center for Supercomputing R&D                                                                                                                                                                                   | Committees in Societies and Government 245<br>IEEE Computer Society President 2018, BoG(2009-1-                                                                                                |
| 1987 IFAC World Congress Young Author Prize                                                                                                                                                                                                                                 | Multicore STC Chair (2012-), Japan Chair (2005-07),                                                                                                                                            |
| 1997 IPSJ Sakai Special Research Award                                                                                                                                                                                                                                      | Chair: HG for Mag. & J. Edit, Sig. on ARC.                                                                                                                                                     |
| 2005 STARC Academia-Industry Research Award                                                                                                                                                                                                                                 | [METI/NEDO] Project Leaders: Multicore for Consu                                                                                                                                               |
| 2008 LSI of the Year Second Prize                                                                                                                                                                                                                                           | Electronics, Advanced Parallelizing Compiler, Chair: C                                                                                                                                         |
| 2008 Intel AsiaAcademic Forum Best Research Award                                                                                                                                                                                                                           | Strategy Committee Cabinet Office CSTP Superco                                                                                                                                                 |
| 2010 IEEE CS Golden Core Member Award                                                                                                                                                                                                                                       | Strategic ICT PT, Japan Prize Selection Committees, e                                                                                                                                          |
| 2014 Minister of Edu., Sci. & Tech. Research Prize, 2015 IPSJ Fellow                                                                                                                                                                                                        | [MEXT] Info. Sci. & Tech. Committee, Supercompute                                                                                                                                              |
| 2017 IEEE Fellow, Member of The Engineering Academy of Japan,                                                                                                                                                                                                               | Simulator, HPCI Promo., Next Gen. Supercomputer K                                                                                                                                              |
| Science Council of Japan, IEEE Eta-Kappa-Nu                                                                                                                                                                                                                                 | Committees, etc.                                                                                                                                                                               |





ir (2005-07), IPSJ core for Consumer piler, Chair: Computer CSTP Supercomputer Committees, etc. Supercomputers (Earth ercomputer K)

an, US, GB, China Papers, Web News,

BoG(2009-14),



### Hironori Kasahara Voted 2017 IEEE Computer Society **President-Elect**

LOS ALAMITOS, Calif., 30 September 2016 - Hironori Kasahara, a Professor of Computer Science at Waseda University in Tokyo, and Director of the Advanced Multicore Research Institute, has been voted IEEE Computer Society 2017 President-Elect.

Kasahara is a former member of the IEEE-CS Board of Governors, has served as chair of the IEEE-CS Multicore STC and CS Japan Chapter, and board member of the IEEE Tokyo Section. Kasahara will serve as the 2018 IEEE CS President for a one-year term beginning 1 January 2018. Kasahara garnered 3,278 votes, compared with 2,804 votes cast for Hausi A. Müller, a Professor of Computer Science and Associate Dean of Research, Faculty of Engineering at University of Victoria, Canada, and a member of IEEE-CS Board of Governors.

The President oversees IEEE-CS programs and operations and is a nonvoting member of most IEEE-CS program boards and committees. The 2016 election had a 12.69% turnout, with 6,357 ballots cast. The turnout was higher than the 2015 election with and 12.68% turnout (6,239 ballots cast) and the 2014 election with a 12.66% turnout (6,728 ballots cast).

### 2016 IEEE Computer Society Election Results

Press Release | Ballot counts

Posted 29 September 2016

### Hironori Kasahara selected 2017 President-Elect (2018 President)



Hironori Kasahara has served as a chair or member of 225 society and government committees, including a member of the CS Board of Governors; chair of CS Multicore STC and CS Japan chapter; associate editor of IEEE Transactions on Computers; vice PC chair of the 1996 ENIAC 50th Anniversary International Conference on Supercomputing; general chair of LCPC; PC member of SC, PACT, PPoPP, and ASPLOS; board member of IEEE Tokyo section; and member of the Earth Simulator committee.

He received a PhD in 1985 from Waseda University, Tokyo, joined its faculty in 1986, and has been a professor of computer science since 1997 and a director of the Advanced Multicore Research Institute since 2004. He was a visiting scholar at University of California, Berkeley, and the University

of Illinois at Urbana-Champaign's Center for Supercomputing R&D.

Kasahara received the CS Golden Core Member Award, IFAC World Congress Young Author Prize, IPSJ Fellow and Sakai Special Research Award, and the Japanese Minister's Science and Technology Prize. He led Japanese national projects on parallelizing compilers and embedded multicores, and has presented 210 papers, 132 invited talks, and 27 patents. His research has appeared in 520 newspaper and Web articles.







# IEEE Computer Society BoG (Board of Governors) Feb.1, 2017







# **IEEE Computer Society**

60,000+ members, volunteer-led organization,
200 technical conferences, industry-oriented "Rock Stars",
17 scholarly journals and 13 magazines, awards program,
Digital Library with more than 550,000 articles and papers ,
400 local and regional chapters, 40 technical committees,







IEEE-USA (Regions 1-6)



### Collaboration of SEMICON and IEEE COMPSAC: 2019 Collocation in San Francisco

IEEE (computer society

CFP (pdf)

Join Log in Help Contact **IEEE** 

About Digital Library Publications Conferences & Events Membership Communities Jobs Board Professional Education Corporate Programs Volunteers & Governance



Follow IEEE Computer Society 7 48. • in 88 Follow COMPSAC fin • @Email Us

| Home                            |
|---------------------------------|
| Submit a paper                  |
| Call for Papers                 |
| J1C2/C1J2 Papers                |
| Symposia                        |
| Call for Workshops              |
| Important Dates                 |
| Information for Authors         |
| Organizers                      |
| Archives                        |
| 2017 Conference                 |
| Support COMPSAC                 |
| Part News/Announcements         |
| Report on COMPSAC 2017 by Sorel |

COMPSAC Standing Committee Chair

SEMICON

### COMPSAC 2018: Staying Smarter in a Smartening World Tokyo, Japan - July 23-27

Computer technologies are producing profound changes in society. Emerging developments in areas such as Deep Learning, supported by increasingly powerful and increasingly miniaturized hardware, are beginning to be deployed in architectures, systems, and applications that are redefining the relationships between humans and technology. As this happens, humans are relinquishing their roles as masters of technology to partnerships wherein autonomous, computer-driven devices become our assistants. What are the technologies enabling these changes? How far can these partnerships go? What will be our future as we deploy more and more "things" on the Internet of Things – to create smart cities, smart vehicles, smart hospitals, smart homes, smart clothes, etc.? Will humans simply become IOT devices in these scenarios and if so, what will be the social, cultural, and economic challenges arising from these developments? What are the technologies to making this all happen – for example, in terms of technologies such as Big Data, Cloud, Fog, Edge Computing, mobile computing, and pervasive computing in general? What will be the role of the 'user' as the 21st Century moves along?

COMPSAC 2018 is organized as a tightly integrated union of symposia, each of which will focus on technical aspects related to the "smart" theme of the conference. The technical program will include keynote addresses, research papers, industrial case studies, fast abstracts, a doctoral symposium, poster sessions, and workshops and tutorials on emerging and important topics related to the conference theme. A highlight of the conference will be plenary and specialized panels that will address the technical challenges facing technologists who are developing and deploying these smart systems and applications. Panels will also address cultural and societal challenges for a society whose members must continue to learn to live, work, and play in the environments the technologies produce. Authors are invited to submit original, unpublished research work, as well as industrial practice reports. Simultaneous submission to other publication venues is not permitted. All submissions must adhere to IEEE Publishing Policies, and all will be vetted through the IEEE CrossCheck Portal.

Conference venue: Hitotsubashi Hall and National Institute of Informatics, National Center of Sciences Accomodation: KKR Hotel Tokyo Conference reception - July 24: Gakushikaikan Conference banquet - July 25: RIHGA Royal Hotel Tokyo (TBD) <u>Local Chairs</u> Hironori Washizaki,

WasedaUniv.

Nobukazu Yoshioka, NII



# Standing Committee Chair

IEEE CS President 2011 Sorel Reisman, California State Univ.

<u>Steering Committee Chair</u> Sheikh Iqbal Ahamed, Marquette Univ.

<u>General Chairs</u> Shinichi Honiden, NII, Japan

Roger U. Fujii, IEEE CS President 2016

### **COMPSAC Organizers** Standing Committee Chain











# Multicores for Performance and Low Power Power consumption is one of the biggest problems for performance scaling from smartphones to cloud servers and supercomputers ("K" more than 10MW).



IEEE ISSCC08: Paper No. 4.5, M.ITO, ... H. Kasahara, "An 8640 MIPS SoC with Independent Power-off Control of 8 CPUs and 8 RAMs by an Automatic Parallelizing Compiler"

Power  $\propto$  Frequency \* Voltage<sup>2</sup> (Voltage  $\propto$  Frequency) Power  $\propto$  Frequency<sup>3</sup>

If <u>Frequency</u> is reduced to 1/4 (4GHz $\rightarrow$ 1GHz), Power is reduced to 1/64 and Performance falls down to 1/4. <Multicores>

If 8 cores are integrated on a chip,

Power is still 1/8 and

Performance becomes 2 times.











# **Power Reduction of MPEG2 Decoding to 1/4 on 8 Core Homogeneous Multicore RP-2 by OSCAR Compiler**







## Demo of NEDO Multicore for Real Time Consumer Electronics at the Council of Science and Engineering Policy on April 10, 2008

第74回総合科学技術会議【平成20年4月10日】



第74回総合科学技術会議の様子(3)



7月千日1

第74回総合科学技術会議の様子

**CSTP** Members Prime Minister: Mr. Y. FUKUDA Minister of State for Science, Technology and Innovation Policy: Mr. F. KISHIDA Chief Cabinet Secretary: Mr. N. MACHIMURA Minister of Internal Affairs and **Communications** : Mr. H. MASUDA Minister of Finance : Mr. F. NUKAGA Minister of Education, Culture, Sports, Science and Technology: Mr. K. TOKAI Minister of Economy, Trade and Industry: Mr. A. AMARI

描述半導作





# Waseda Univ. Green Computing Systems R&D Center Supported by METI (Mar. 2011 Completion)

# <R & D Target>

Hardware, Software, Application for Super Low-Power Manycore ► More than 64 cores >Natural air cooling (No fan) Cool, Compact, Clear, Quiet >Operational by Solar Panel <Industry, Government, Academia> Hitachi, Fujitsu, NEC, Renesas, Olympus, Toyota, Denso, Mitsubishi, Toshiba, OSCAR Technology, etc <Ripple Effect>

>Low CO<sub>2</sub> (Carbon Dioxide) Emissions

Creation Value Added Products

> Automobiles, Medical, IoT, Servers **SEMICON<sup>®</sup>** 





Hitachi SR16000: Power7 128coreSMP Fujitsu M9000 SPARC VII 256 core SMP



Beside Subway Waseda Station, Near Waseda Univ. Main Campus



# OSCAR Parallelizing Compiler To improve effective performance, cost-performance and software productivity and reduce power

## Multigrain Parallelization(1991,2001,04)

coarse-grain parallelism among loops and subroutines (2000 on SMP), near fine grain parallelism among statements (1992) in addition to loop parallelism

# Data Localization

Automatic data management for distributed shared memory, cache and local memory (Local Memory 1995, 2016 on RP2, Cache 2001, 03) Software Coherent Control (2017)

## Data Transfer Overlapping(2016 partially)

Data transfer overlapping using Data Transfer Controllers (DMAs)

## **Power Reduction**

(2005 for Multicore, 2011 Multi-processes, 2013 on ARM)

Reduction of consumed power by compiler control DVFS and Power gating with hardware supports.









# Speedup for H.264 and Optical Flow on ARM Cortex-A9 **Android 3 cores by OSCAR Compiler**









### 2.78









Power for 3 cores was reduced to  $1/5 \sim 1/7$  against without software power control Power for 3 cores was reduced to  $1/2 \sim 1/3$  against ordinary 1 core execution





## Automatic Power Reduction of OpenCV Face Detection on **big.LITTLE ARM Processor** 6



Samsung Exynos 5422 Processor

**2GB LPDDR3 RAM** 

• 4x Cortex-A15 2.0GHz, 4x Cortex-A7 1.4GHz big.LITTLE Architecture



Frequency can be changed by each cluster unit











# 110 Times Speedup against the Sequential Processing for GMS Earthquake Wave Propagation Simulation on Hitachi SR16000 (Power7 Based 128 Core Linux SMP)





# Cancer TreatmentCarbon Ion Radiotherapy

(Previous best was 2.5 times speedup on 16 processors with hand optimization)



8.9 times speedup by 12 processors

Intel Xeon X5670 2.93GHz 12

core SMP (Hitachi HA8000)



55 times speedup by 64 processors IBM Power 7 64 core SMP (Hitachi SR16000)





# 64CPU

55.10

# **Performance on Multicore Server for Latest Cancer Treatment Using Heavy Particle (Proton, Carbon Ion)** 327 times speedup on 144 cores



**Original sequential execution time 2948 sec (50 minutes) using GCC was**  $\succ$ reduced to 9 sec with 144 cores (327.6 times speedup)

Reduction of treatment cost and reservation waiting period is expected











# **Engine Control by multicore with Denso**

Though so far parallel processing of the engine control on multicore has been very difficult, Denso and Waseda succeeded 1.95 times speedup on 2core V850 multicore processor.



- > Hard real-time automobile engine control by multicore using local memories
- > Millions of lines C codes consisting conditional branches and basic blocks

















23

💋 sem i

# Speedups of MATLAB/Simulink Image Processing on Various 4core Multicores (Intel Xeon, ARM Cortex A15 and Renesas SH4A)



Buoy Detection : http://www.mathworks.co.jp/matlabcentral/fileexchange/44706-buoy-detection-using-simulink

Color Edge Detection : http://www.mathworks.co.jp/matlabcentral/fileexchange/28114-fast-edges-of-a-color-image--actual-color--not-convertingto-gravscale-/

Vessel Detection : http://www.mathworks.co.jp/matlabcentral/fileexchange/24990-retinal-blood-vessel-extraction/



# **OSCAR Heterogeneous Multicore**



SEMICON

JAPAN



Data Transfer Unit

Local Program Memory

Local Data Memory

**Distributed Shared** Memory

**Centralized Shared** Memory

Frequency/Voltage Control Register



# An Image of Static Schedule for Heterogeneous Multi-core with Data Transfer Overlapping and Power Control







# OSCAR API Ver. 2.0 for Homogeneous/Heterogeneous Multicores and Manycores

# List of Directives (22 directives)

- Parallel Execution API
  - parallel sections (\*)
  - flush (\*)
  - critical (\*)
  - execution
- Memoay Mapping API
  - threadprivate (\*)
  - distributedshared
  - onchipshared
- Synchronization API
  - groupbarrier
- Data Transfer API
  - dma\_transfer
  - dma\_contiguous\_parameter
  - dma\_stride\_parameter
  - dma\_flag\_check
  - dma\_flag\_send

(\* from OpenMP)

- Power Control API
  - fvcontrol
  - get\_fvstatus
- Timer API
  - get\_current\_time
- Accelerator
  - accelerator\_task\_entry
- Cache Control
  - cache\_writeback
  - cache\_selfinvalidate
  - complete\_memop
  - noncacheable
  - aligncache
    - 2 hint directives for OSCAR compiler
    - accelerator\_task
    - oscar\_comment

from V2.0







# **33 Times Speedup Using OSCAR Compiler and OSCAR** API on RP-X (Optical Flow with a hand-tuned library)















### 1987 OSCAR(Optimally Scheduled Advanced Multiprocessor) Co-design of Compiler and Architecture

Looking at various applications, design a parallelizing compiler and design a multiprocessor/multicore-processor to support compiler optimization









### OSCAR(Optimally Scheduled Advanced Multiprocessor) 1987









### OSCAR Memory Space (Global Address Space)

### LOCAL MEMORY SPACE

### SYSTEM MEMORY SPACE







# **Hierarchical Barrier Synchronization**

- Specifying a hierarchical group barrier
  - #pragma oscar group\_barrier (C)
  - !\$oscar group\_barrier (Fortran)







## 4 core multicore RP1 (2007), 8 core multicore RP2 (2008) and 15 core Heterogeneous multicore RPX (2010) developed in NEDO Projects with Hitachi and Renesas









# Automatic Parallelization of JPEG-XR for Drinkable Inner Camera 10 times more speedup needed after parallelization for 128 cores of Power 7. Less than 35mW power consumption is required.







# OSCAR Vector Multicore and Compiler for Embedded to Severs with OSCAR Technology





# Fujitsu VPP500/NWT







# FujitsuVPP500/NWT







# Earth Simulator

(http://www.es.jamstec.go.jp/)

- Earth Environmental simulation like Global Warming, El Nino, PlateMovement for the all lives onr this planet.
- •Developed in Mar. 2002 by STA (MEXT) and NEC with 400 M\$ investment under Dr. Miyoshi's direction.



Mr. Hajime Miyoshi

(Dr.Miyoshi: Passed away in Nov.2001. NWT, VPP500, SX6)







# **OSCAR Technology Corp.** Started up on Feb. 28, 2013:

Licensing the all patents and OSCAR compiler from Waseda Univ.

Founder and CEO: Dr. T. Ono (Ex- CEO of First Section-listed Company, **Director of National U., Invited Prof. of Waseda U.)** 

Executives: Dr. M. Ohashi : COO (Ex- OO of Ono Sokki)

<u>Mr. A. Nodomi : CTO (Ex- Spansion)</u>

<u>Mr. N. Ito (Ex- Visiting Prof. Tokyo Agricult. And Tech. U.)</u>

Dr. K. Shirai (Ex- President of Waseda U., Ex- Chairman of Japanese **Open U.)** 

Mr. K. Ashida (Ex- VP Sumitomo Trading, Adhida Consult. CEO)

Mr. S. Tsuchida (Co-Chief Investment Officer of Innovation Network **Corp. of Japan**)

Auditor: Mr. S. Honda (Ex- Senior VP and General Manager of MUFG)

Dr. S. Matuda (Emeritus Prof. of Waseda U., Chairman of WERU INVESTMENT)

<u>Mr. Y. Hirowatari (President of AGS Consulting)</u>

Advisors: Prof. H. Kasahara (Waseda U.)

Prof. K. Kimura (Waseda U.)















# **Future Multicore Products**



### Smart phones

## **Next Generation Automobiles**

- Safer, more comfortable, energy efficient, environment friendly - Cameras, radar, car2car communication, internet information integrated brake, steering, engine, moter control

### Advanced medical systems

### Personal / Regional Supercomputers



-From everyday recharging to less than once a week - Solar powered operation in emergency condition Keep health



Cancer treatment, Drinkable inner camera

- **Emergency solar powered**
- No cooling fun, No dust, clean usable inside OP room





Solar powered with more than 100 times power efficient : FLOPS/W

- **Regional Disaster Simulators**
- fires with earth quakes



saving lives from tornadoes, localized heavy rain,



41

## **Summary**

- Waseda University Green Computing Systems R&D Center supported by METI has been researching on low-power high performance Green Multicore hardware, software and application with industry including Hitachi, Fujitsu, NEC, Renesas, Denso, Toyota, Olympus and OSCAR Technology.
- OSCAR Automatic Parallelizing and Power Reducing Compiler has succeeded speedup and/or power reduction of scientific <u>applications</u> <u>including</u> "Earthquake Wave Propagation", medical applications including "Cancer Treatment Using Carbon Ion", and "Drinkable Inner Camera</u>", industry application including "Automobile Engine Control", "Smartphone", and "Wireless communication Base Band Processing" on various multicores from different vendors including Intel, ARM, IBM, AMD, Qualcomm, Freescale, Renesas and Fujitsu.
- In automatic parallelization, 110 times speedup for "Earthquake Wave Propagation Simulation" on 128  $\succ$ cores of IBM Power 7 against 1 core, 55 times speedup for "Carbon Ion Radiotherapy Cancer" Treatment" on 64cores IBM Power7, 1.95 times for "Automobile Engine Control" on Renesas 2 cores using SH4A or V850, 55 times for "JPEG-XR Encoding for Capsule Inner Cameras" on Tilera 64 cores Tile64 manycore.
  - The compiler will be available on market from OSCAR Technology.
- In <u>automatic power reduction</u>, <u>consumed powers</u> for real-time multi-media applications like Human face  $\succ$ detection, H.264, mpeg2 and optical flow were reduced to 1/2 or 1/3 using 3 cores of ARM Cortex A9 and Intel Haswell and 1/4 using Renesas SH4A 8 cores against ordinary single core execution.
- Local memory management for automobiles and software coherent control have been patented and already realized by OSCAR compiler.



