### **OSCAR** Automatic Parallelizing and Power Reducing Multicore Compiler for <u>Realtime Embedded</u> to High Performance Computing



Hironori Kasahara, Ph.D., IEEE Fellow, IPSJ Fellow IEEE Computer Society President 2018 Professor, Dept. of Computer Science & Engineering Director, Advanced Multicore Processor Research Institute Waseda University, Tokyo, Japan

URL: http://www.kasahara.cs.waseda.ac.jp/

1980 BS, 82 MS, 85 Ph.D. , Dept. EE, Waseda Univ. 1985 Visiting Scholar: U. of California, Berkeley 1986 Assistant Prof., 1988 Associate Prof., 1997, Waseda Univ., Now Dept. of Computer Sci. & Eng. 1989-90 Research Scholar: U. of Illinois, Urbana-Champaign, Center for Supercomputing R&D 2004 Director, Advanced Multicore Research Institute, 2017 member: the Engineering Academy of Japan and the Science Council of Japan 2005 STARC Academia-Industry Research Award

2005 STARC Academia-Industry Research Award 2008 LSI of the Year Second Prize 2008 Intel AsiaAcademic Forum Best Research Award 2010 IEEE CS Golden Core Member Award 2014 Minister of Edu., Sci. & Tech. Research Prize 2015 IPSJ Fellow 2017 IEEE Fellow, IEEE Eta Kappa Nu Reviewed Papers: 216, Invited Talks: 155, Published Unexamined Patent Application:59 (Japan, US, GB, China Granted Patents: 30), Articles in News Papers, Web News, Medias incl. TV etc.: 578

Committees in Societies and Government 255 IEEE Computer Society President 2018, BoG(2009-14), Multicore STC Chair (2012-), Japan Chair (2005-07), IPSJ Chair: HG for Mag. & J. Edit, Sig. on ARC. [METI/NEDO] Project Leaders: Multicore for Consumer Electronics, Advanced Parallelizing Compiler, Chair: Computer Strategy Committee [Cabinet Office] CSTP Supercomputer Strategic ICT PT, Japan Prize Selection Committees, etc. [MEXT] Info. Sci. & Tech. Committee, Supercomputers (Earth Simulator, HPCI Promo., Next Gen. Supercomputer K) Committees, etc.

### **IEEE Computer Society**



#### IEEE-USA (Regions 1-6)



### **IEEE Computer Society Members**







# IEEE Computer Society BoG (Board of Governors) Feb.1, 2018



14-Mar-18 https://www.computer.org/web/cshistory/officers-2018

### **Past IEEE Computer Society Presidents**

**Chairs of the IRE Professional** 

Group on Electronic Computers 1951-53 Morton M. Astrahan 1953-54 John H. Howard 1954-55 Harry Larson 1955-56 Jean H. Felker 1956-57 Jerre D. Noe 1957-58 Werner Buchholz 1958-59 Willis H. Ware 1959-60 Richard O. Endres 1960-62 Arnold A. Cohen 1962-64 Walter L. Anderson

Chairs of the AIEE Committee on Large-Scale Computing Devices 1946-49 Charles Concordia 1949-51 John Grist Brainerd 1951-53 Walter H. MacWilliams 1953-55 Frank J. Maginniss 1955-57 Edwin L. Harder 1957-59 Morris Rubinoff 1959-61 Ruben A. Imm 1961-63 Claude A. Kagan 1963-64 Gerhard L. Hollander **Chairs & Presidents of the IEEE Computer Society 1964-65** Keith Uncapher 1965-66 Richard L. Tanaka 1966-67 Samuel Levine 1968-69 Charles L. Hobbs 1970-71 Edward J. McCluskev **1972-73** Albert S. Hoagland 1974-75 Stephen S. Yau **Dick B. Simmons** 1976 1977-78 Merlin G. Smith 1979-80 Tse-Yun Feng 1981 **Richard E. Merwin** 1982-83 Oscar N. Garcia 1984-85 Martha Sloan 1986-87 Roy L. Russo **1988** Edward A. Parrish **1989** Kenneth A. Anderson 1990 Helen M. Wood **1991** Duncan H. Lawrie **1992** Bruce D. Shriver 1993 James H. Aylor 1994 Laurel V. Kaleda 1995 Ronald G. Hoelzeman

1996 Mario R. Barbacci 1997 Barry W. Johnson 1998 Doris L. Carver **1999 Leonard L. Tripp 2000** Guylaine M. Pollock 2001 Benjamin W. Wah 2002 Willis K. King 2003 Stephen Diamond 2004 Carl K. Chang 2005 Gerald L. Engel 2006 Deborah M. Cooper 2007 Michael R. Williams 2008 Rangachar Kasturi 2009 Susan K. (Kathy) Land, 2010 James D. Isaak 2011 Sorel Reisman 2012 John W. Walz 2013 David Alan Grier 2014 Dejan S. Milojicic 2015 Thomas M. Conte 2016 Roger U. Fujii 2017 Jean-Luc Gaudiot 2018 Hironori Kasahara

IPSJ/IEEE-CS Young Computer Researcher Award For members of the IPSJ and the IEEE-CS The First Award Ceremony: COMPSAC2018, July 23-27, NII, Tokyo https://ieeecompsac.computer.org/2018/



**Bjarne Stroustrup** 2018 Computer Society Computer Pioneer Award Columbia University



**Masaru Kitsuregawa** Director General of NII, Past President of IPSJ

Margaret Martonosi 2018 Computer Society Technical Achievement Princeton University

Dejan Milojicic CS President 2014 HP Labs CS 2022 Report



| Choose Your<br>Content Bundle                                                                                                                                                                                                                                                                                                                                   | Preferred<br>Plus<br>Q<br>More Info                     | Training &<br>Development                               | Research<br>Q<br>More Info                              | Basic<br>Ø                                              | Student<br>More Info                                             |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------|---------------------------------------------------------|---------------------------------------------------------|---------------------------------------------------------|------------------------------------------------------------------|
| IEEE Member: Add Computer Society to your existing IEEE<br>membership. IEEE Membership is an additional charge.<br>Affiliate Member: Join Computer Society only as a Computer Society<br>affiliate. IEEE benefits are not included.<br>Student Member: Students must join IEEE when joining Computer<br>Society. IEEE Student members can add Computer Society. | IEEE Member: \$30<br>Affiliate Member: \$78<br>JOIN NOW | IEEE Member: \$28<br>Affiliate Member: \$71<br>JOIN NOW | IEEE Member: \$28<br>Affiliate Member: \$71<br>JOIN NOW | IEEE Member: \$20<br>Affiliate Member: \$63<br>JOIN NOW | IEEE Student Member: \$4<br>New Student Member: \$20<br>JOIN NOW |
| Computer magazine (12 digital issues)*                                                                                                                                                                                                                                                                                                                          | 0                                                       | 0                                                       | 0                                                       | 0                                                       | 0                                                                |
| Members-only discounts on conferences and events                                                                                                                                                                                                                                                                                                                | <b>⊘</b>                                                | <b>0</b>                                                |                                                         |                                                         | Ø                                                                |
| Members-only webinars<br>Unlimited access to Computing Now, computer.org, and the new                                                                                                                                                                                                                                                                           | Ø<br>Ø                                                  |                                                         | Ø<br>Ø                                                  | <b>Ø</b>                                                | Ø<br>Ø                                                           |
| mobile-ready myCS<br>Local chapter membership                                                                                                                                                                                                                                                                                                                   | ~<br>⊘                                                  | ✓                                                       | ~<br>Ø                                                  | ~<br>⊘                                                  | <i>•</i>                                                         |
| Skillsoft Skillchoice™ Complete with 67,000+ books, videos, courses, practice exams and mentorship resources                                                                                                                                                                                                                                                    | Ø                                                       | Ø                                                       |                                                         |                                                         | Ø                                                                |
| Books24x7 on-demand access to 15,000 technical and business resources                                                                                                                                                                                                                                                                                           | Ø                                                       | Ø                                                       |                                                         |                                                         |                                                                  |
| Two complimentary Computer Society magazines subscriptions                                                                                                                                                                                                                                                                                                      |                                                         |                                                         |                                                         |                                                         |                                                                  |
| myComputer mobile app                                                                                                                                                                                                                                                                                                                                           | 30 tokens                                               |                                                         | 30 tokens                                               |                                                         | 30 tokens                                                        |
| Computer Society Digital Library                                                                                                                                                                                                                                                                                                                                | 12 FREE downloads                                       | Member pricing                                          | 12 FREE downloads                                       | Member pricing                                          | Included                                                         |
| Training webinars                                                                                                                                                                                                                                                                                                                                               | 3 FREE webinars                                         | 3 Free webinars                                         | Member pricing                                          | Member pricing                                          | Member pricing                                                   |
| Priority registration to Computer Society events                                                                                                                                                                                                                                                                                                                | Ø                                                       |                                                         |                                                         |                                                         |                                                                  |
| Right to vote and hold office                                                                                                                                                                                                                                                                                                                                   | <b>Ø</b>                                                |                                                         | <b>Ø</b>                                                | <b>Ø</b>                                                |                                                                  |
| One-time 20% Computer Society online store discount                                                                                                                                                                                                                                                                                                             | <b>Ø</b>                                                |                                                         |                                                         |                                                         |                                                                  |

\* Print publications are available for an additional fee. See IEEE catalog for details.

https://www.computer.org/web/education/multicore-video-series#

#### IEEE @computer society

Home // Professional Education // Certification Credentials // Certificates of Achievement // Multicore Video Series

### **Multicore Video Series**

- Automatic Parallelization: David Padua
- Autoparallelization for GPUs: Wen-mei Hwu
- Dependences and Dependence Analysis: Utpal Banerjee
- Dynamic Parallelization: Rudolf Eigenmann
- > Instruction Level Parallelization: Alexandru Nicolau
- Multigrain Parallelization and Power Reduction: Hironori Kasahara
- > The Polyhedral Model: Paul Feautrier
- Vector Computation: <u>David Kuck (Computer Pioneer)</u>
- Vectorization: P. Sadayappan
- Vectorization/Parallelization in the IBM Compiler: Yaoqing Gao
- Vectorization/Parallelization in the Intel Compiler: Peng Tu
- Roundtable Discussion by all presenters

### Self-Paced Learning:

Approximate time = 12 hours

- PDH: 12.0
- CEU: 1.2

### **Full Series Price:**

- IEEE CS Member: \$195
- Nonmember: \$1,000

### Individual Videos:

- IEEE CS Member: \$30
- Nonmember: \$125

#### See individual videos below.

For questions, please contact certification@computer.org.



14-Mar-18

## **Toward 2018**

- Refining content and services to further improve the satisfaction of CS 1. members;
- Considering an incentive for volunteers to further accelerate CS activities 2. and promptly provide technical benefits for people around the globe; To express appreciation to volunteers: <u>CS Point (Mileage) System: Annual & Life Time Honor,</u> Premier Seating, Premier Registration, Distinguished Reviewer, etc.
- Offering more attractive services for practitioners in industry; 3.
- Providing the world's best educational content and historical treasures for 4. future generations, which only the CS can create with our pioneering researchers (for example, the Multicore Compiler Video Series found at www.computer.org/web/education/multicore-video-series);
- Thinking about sustainable membership fees while considering the 5. diversity of economic situations within the 10 regions;
- Cooperating with other IEEE societies and sister societies in a timely and 6. efficient manner:
- Intelligibly introducing the latest computer-related technologies to 7. younger generations, including children, so that they can realize their technological dreams. Computer Society 9



## **Multicores for Performance and Low Power**

Power consumption is one of the biggest problems for performance scaling from smartphones to cloud servers and supercomputers ("K" more than 10MW).



IEEE ISSCC08: Paper No. 4.5, M.ITO, ... and H. Kasahara, "An 8640 MIPS SoC with Independent Power-off Control of 8 CPUs and 8 RAMs by an Automatic Parallelizing Compiler" Power  $\propto$  Frequency \* Voltage<sup>2</sup> (Voltage  $\propto$  Frequency)

▶ Power ∝ Frequency<sup>3</sup>

If Frequency is reduced to 1/4 (Ex. 4GHz→1GHz), Power is reduced to 1/64 and Performance falls down to 1/4 . <<u>Multicores</u>> If <u>8cores</u> are integrated on a chip, Power is still <u>1/8</u> and

**<u>Performance</u>** becomes <u>2 times</u>.



Automatic parallelizing compiler available on the market gave us no speedup against execution time on 1 core on 64 cores
Execution time with 128 cores was slower than 1 core (0.9 times speedup)

- Advanced OSCAR parallelizing compiler gave us 211 times speedup with 128cores against execution time with 1 core using commercial compiler
  - > OSCAR compiler gave us 2.1 times speedup on 1 core against commercial compiler by global cache optimization



## **OSCAR Parallelizing Compiler**

### To improve effective performance, cost-performance and software productivity and reduce power

**Multigrain Parallelization**(LCPC1991,2001,04) coarse-grain parallelism among loops and subroutines (2000 on SMP), near fine grain parallelism among statements (1992) in addition to loop parallelism

### **Data Localization**

Automatic data management for distributed shared memory, cache and local memory (Local Memory 1995, 2016 on RP2,Cache2001,03) Software Coherent Control (2017)

### Data Transfer Overlapping(2016 partially)

Data transfer overlapping using Data Transfer Controllers (DMAs)

### **Power Reduction**

(2005 for Multicore, 2011 Multi-processes, 2013 on ARM)

Reduction of consumed power by compiler control DVFS and Power gating with hardware supports.



## **Generation of Coarse Grain Tasks**

### Macro-tasks (MTs)

- Block of Pseudo Assignments (BPA): Basic Block (BB)
- Repetition Block (RB) : natural loop
- Subroutine Block (SB): subroutine



## Speedup ratio for H.264 and Optical Flow on ARM Cortex-A9 Android 3 cores by OSCAR Automatic Parallelization



## Automatic Power Reduction on ARM CortexA9 with Android

http://www.youtube.com/channel/UCS43INYEIkC8i\_KIgFZYQBQ ODROID X2

Samsung Exynos4412 Prime, ARM Cortex-A9 Quad core 1.7GHz~0.2GHz, used by Samsung's Galaxy S3



Power for 3cores was reduced to  $1/5 \sim 1/7$  against without software power control Power for 3cores was reduced to  $1/2 \sim 1/3$  against ordinary 1core execution<sup>16</sup>

## Automatic Power Reuction on Intel Haswell H.264 decoder & Optical Flow (3cores)



Power for 3cores was reduced to  $1/3 \sim 1/4$  against without software power control Power for 3cores was reduced to  $2/5 \sim 1/3$  against ordinary 1core execution

## Automatic Power Reduction of OpenCV Face Detection on big.LITTLE ARM Processor



## 110 Times Speedup against the Sequential Processing for GMS Earthquake Wave Propagation Simulation on Hitachi SR16000

### (Power7 Based 128 Core Linux SMP) (LCPC2015)



### Performance on Multicore Server for Latest Cancer Treatment Using Heavy Particle (Proton, Carbon Ion) 327 times speedup on 144 cores

Hitachi 144cores SMP Blade Server BS500: Xeon E7-8890 V3(2.5GHz 18core/chip) x8 chip



- Original sequential execution time 2948 sec (50 minutes) using GCC was reduced to 9 sec with 144 cores (327.6 times speedup)
  - > Reduction of treatment cost and reservation waiting period is expected

## Earliest Executable Condition Analysis for Coarse Grain Tasks (Macro-tasks)



### **PRIORITY DETERMINATION IN DYNAMIC CP METHOD**



## **Earliest Executable Conditions**

| Macrotask No. | Earliest Executable Condition |  |  |  |
|---------------|-------------------------------|--|--|--|
| 1             |                               |  |  |  |
| 2             | 1 2                           |  |  |  |
| 3             | (1) 3                         |  |  |  |
| 4             | 2 4 OR (1) 3                  |  |  |  |
| 5             | (4) 5 AND [ 2 4 OR (1) 3 ]    |  |  |  |
| 6             | 3 OR (2) 4                    |  |  |  |
| 7             | 5 OR (4) 6                    |  |  |  |
| ~ 8           | (2) 4 OR (1) 3                |  |  |  |
| 9             | (8) 9                         |  |  |  |
| 10            | (8) 10                        |  |  |  |
| <u> </u>      | 89 OR 810                     |  |  |  |
| 12            | 11 12 AND [ 9 OR (8) 10 ]     |  |  |  |
| 13            | 11 13 OR 11 12                |  |  |  |
| 14            | (8) 9 OR (8) 10               |  |  |  |
| 15            | 2 15                          |  |  |  |

## Automatic processor assignment in 103.su2cor

• Using 14 processors

**Coarse grain parallelization within DO400** 



### MTG of Su2cor-LOOPS-DO400

### Coarse grain parallelism PARA\_ALD = 4.3



## **Data-Localization: Loop Aligned Decomposition**

- Decompose multiple loop (Doall and Seq) into CARs and LRs considering inter-loop data dependence.
  - Most data in LR can be passed through LM.
  - LR: Localizable Region, CAR: Commonly Accessed Region





## Inter-loop data dependence analysis in TLG

- Define exit-RB in TLG as Standard-Loop
- Find iterations on which a iteration of Standard-Loop is data dependent
  - e.g. K<sub>th</sub> of RB3 is data-dep on K-1<sub>th</sub>,K<sub>th</sub> of RB2, on K-1<sub>th</sub>,K<sub>th</sub>,K+1<sub>th</sub> of RB1



Example of TLG

## Target Loop Group Creation and Inter-Loop Dependence Analysis

### Target Loop Groups

- grouped loops that access the same array
- baseline loop chosen for each group
  - the largest estimated time loop
- Inter-Loop Dependency Analysis
  - data dependencies between loops within the TLGs
  - detects relevant iterations of those loops that have dependence with the iterations of the baseline loop



#### Inter-Loop dependence

## **Decomposition of RBs in TLG**

- Decompose GCIR into  $DGCIR^p(1 \le p \le n)$ 
  - n: (multiple) num of PCs, DGCIR: Decomposed GCIR
- Generate CAR on which DGCIR<sup>p</sup>&DGCIR<sup>p+1</sup> are data-dep.
- Generate LR on which DGCIR<sup>p</sup> is data-dep.



## Low-Power Optimization with OSCAR API





# Engine Control by multicore with Denso

Though so far parallel processing of the engine control on multicore has been very difficult, Denso and Waseda succeeded 1.95 times speedup on 2core V850 multicore processor.



Hard real-time automobile engine control by multicore using local memories

 Millions of lines C codes consisting conditional branches and basic blocks





## **Macro Task Fusion for Static Task Scheduling**



# **3.1 Restructuring : Inline Expansion**

Inline expansion is effective

**To increase coarse grain parallelism** 

Expands functions having inner parallelism

Improves coarse grain parallelism



MTG before inline expansion

MTG after inline expansion

### MTG of Crankshaft Program Using Inline Expansion



#### Not enough coarse grain parallelism yet!

## **3.2 Restructuring: Duplicating If-statements**

Duplicating if-statements is effective

- To increase coarse grain parallelism
- Duplicates fused tasks having inner parallelism



## MTG of Crankshaft Program Using Inline Expansion and Duplicating If-statements



# **Evaluation of Crankshaft Program** with Multi-core Processors



- □ Attain 1.54 times speedup on RPX
  - There are no loops, but only many conditional branches and small basic blocks and difficult to parallelize this program
- This result shows possibility of multi-core processor for engine control programs

## **OSCAR Compile Flow for Simulink Applications**



## Speedups of MATLAB/Simulink Image Processing on Various 4core Multicores

(Intel Xeon, ARM Cortex A15 and Renesas SH4A)



#### to-grayscale-/

Vessel Detection : <u>http://www.mathworks.co.jp/matlabcentral/fileexchange/24990-retinal-blood-vessel-extraction/</u>

## **OSCAR Heterogeneous Multicore**



### An Image of Static Schedule for Heterogeneous Multicore with Data Transfer Overlapping and Power Control





Power Reduction in a real-time execution controlled by OSCAR Compiler and OSCAR API on RP-X (Optical Flow with a hand-tuned library)



## Software Coherence Control Method on OSCAR Parallelizing Compiler

- Coarse grain task parallelization with earliest condition analysis (control and data dependency analysis to detect parallelism among coarse grain tasks).
- SCAR compiler automatically controls coherence using following simple program restructuring methods:
  - > To cope with stale data problems:

Data synchronization by compilers

> To cope with false sharing problem:

Data Alignment

Array Padding

Non-cacheable Buffer

10 15 12 Data dependency Extended control dependency 13 **Conditional branch** OR AND 14 > Original control flow

MTG generated by earliest executable 46 condition analysis

14-Mar-18

# 8 Core RP2 Chip Block Diagram



## Automatic Software Coherent Control for Manycores Performance of Software Coherence Control by OSCAR Compiler on 8-core RP2



**Automatic Local Memory Management Data Localization: Loop Aligned Decomposition** 

- Decomposed loop into LRs and CARs
  - LR (Localizable Region): Data can be passed through LDM
  - CAR (Commonly Accessed Region): Data transfers are required among processors

**Single dimension Decomposition** 







# **Adjustable Blocks**

Handling a suitable block size for each application

- different from a fixed block size in cache

|         | ◀ 1 Block on Local Memory →                                     |                 |           |                   |                                 |                   |               |                             |  |  |
|---------|-----------------------------------------------------------------|-----------------|-----------|-------------------|---------------------------------|-------------------|---------------|-----------------------------|--|--|
| Level 0 | Block <sub>0</sub> <sup>0</sup>                                 |                 |           |                   |                                 |                   |               |                             |  |  |
| Level 1 | Block <sub>0</sub> <sup>1</sup>                                 |                 |           |                   | Block <sub>1</sub> <sup>1</sup> |                   |               |                             |  |  |
| Level 2 | Block <sub>0</sub> <sup>2</sup> Block <sub>1</sub> <sup>2</sup> |                 | Blo       | $ck_2^2$          | Block <sub>3</sub> <sup>2</sup> |                   |               |                             |  |  |
| Level 3 | B <sub>0</sub> <sup>3</sup>                                     | B1 <sup>3</sup> | $B_2^{3}$ | ${\sf B_{3}}^{3}$ | $B_4^3$                         | ${\sf B_{5}}^{3}$ | ${\sf B_6}^3$ | B <sub>7</sub> <sup>3</sup> |  |  |

# **Multi-dimensional Template Arrays** for Improving Readability

- a mapping technique for arrays with varying dimensions
  - each block on LDM corresponds to multiple empty arrays with varying dimensions
  - these arrays have an additional dimension to store the corresponding block number
    - TA[Block#][] for single dimension
    - TA[Block#][][] for double dimension
    - TA[Block#][][]] for triple dimension
    - ٠
- LDM are represented as a one • dimensional array
  - without Template Arrays, multidimensional arrays have complex index calculations
    - $A[i][j][k] \rightarrow TA[offset + i' * L + j' * M + k']$
  - Template Arrays provide readability
    - A[i][j][k] -> TA[Block#][i'][j'][k']



# **Block Replacement Policy**

### Compiler Control Memory block Replacement

- using live, dead and reuse information of each variable from the scheduled result
- different from LRU in cache that does not use data dependence information

## Block Eviction Priority Policy

- 1. (Dead) Variables that will not be accessed later in the program
- 2. Variables that are accessed only by other processor cores
- 3. Variables that will be later accessed by the current processor core
- 4. Variables that will immediately be accessed by the current processor core

### Speedups by the Proposed Local Memory Management Compared with Utilizing Shared Memory on Benchmarks Application using RP2



20.12 times speedup for 8cores execution using local memory against sequential execution using off-chip shared memory of RP2 for the AACenc

### **1987 OSCAR(Optimally Scheduled Advanced Multiprocessor)**

**Co-design of Compiler and Architecture** 

Looking at various applications, design a parallelizing compiler and design a multiprocessor/multicore-processor to support compiler optimization



### **OSCAR(Optimally Scheduled Advanced Multiprocessor)**



#### **OSCAR Memory Space (Global Address Space)**



#### LOCAL MEMORY SPACE

56

# Hierarchical Barrier Synchronization

- Specifying a hierarchical group barrier
  - #pragma oscar group\_barrier (C)
  - !\$oscar group\_barrier (Fortran)



## VPP500/NWT



## VPP500/NWT



### 4 core multicore RP1 (2007), 8 core multicore RP2 (2008) and 15 core Heterogeneous multicore RPX (2010) developed in NEDO Projects with Hitachi and Renesas



### **OSCAR Vector Multicore and Compiler for Embedded to Severs with OSCAR Technology**



## **Future Multicore Products**

**Advanced medical systems** 



#### **Next Generation Automobiles**

- Safer, more comfortable, energy efficient, environment friendly
- Cameras, radar, car2car communication, internet information integrated brake, steering, engine, moter control

#### Smart phones



-From everyday recharging to less than once a week

- Solar powered operation in emergency condition

- Keep health



#### Cancer treatment, Drinkable inner camera

- Emergency solar powered
- No cooling fun, No dust , clean usable inside OP room

#### Personal / Regional Supercomputers



Solar powered with more than 100 times power efficient : FLOPS/W

Regional Disaster Simulators saving lives from tornadoes, localized heavy rain, fires with earth quakes

### **IEEE Computer Society**

More than 60,000 computer scientist and IT professionals, in 168 countries driving technological innovation.



#### **Our Vision:**

To be the leading provider of technical information, community services, and personalized services to the world's computing professionals.





#### **Summary**

- Waseda University Green Computing Systems R&D Center supported by METI has been researching on low-power high performance Green Multicore hardware, software and application with industry including Hitachi, Fujitsu, NEC, Renesas, Denso, Toyota, Olympus and OSCAR Technology.
- OSCAR Automatic Parallelizing and Power Reducing Compiler <u>has succeeded</u> <u>speedup</u> and/or\_power reduction of scientific <u>applications including "Earthquake</u> Wave Propagation", medical applications including <u>"Cancer Treatment</u> Using Carbon Ion", and <u>"Drinkable Inner Camera</u>", industry application including <u>"Automobile Engine Control"</u>, "Smartphone", and "Wireless communication Base Band Processing" on <u>various multicores from</u> different vendors including <u>Intel</u>, <u>ARM, IBM, AMD, Qualcomm, Freescale, Renesas and Fujitsu.</u>
- In automatic parallelization, 110 times speedup for "Earthquake Wave Propagation Simulation" on 128 cores of IBM Power 7 against 1 core, 55 times speedup for "Carbon Ion Radiotherapy Cancer Treatment" on 64cores IBM Power7, 1.95 times for "Automobile Engine Control" on Renesas 2 cores using SH4A or V850, 55 times for "JPEG-XR Encoding for Capsule Inner Cameras" on Tilera 64 cores Tile64 manycore.
  - > The compiler will be available on market from OSCAR Technology.
- In <u>automatic power reduction</u>, <u>consumed powers for real-time multi-media</u> <u>applications</u> like Human face detection, H.264, mpeg2 and optical flow were reduced to 1/2 or 1/3 using 3 cores of <u>ARM</u> Cortex A9 and <u>Intel Haswell</u> and 1/4 using <u>Renesas</u> SH4A 8 cores against ordinary single core execution.
- Local memory management for automobiles and software coherent control have been patented and already realized by OSCAR compiler.