# **OSCAR Parallelizing Compiler Cooperative** Heterogeneous Multi-core Architecture

Akihiro Hayashi Yasutaka Wada Hiroaki Shikano Teruo Kamiayama Takeshi Watanabe Takeshi Sekiguchi Masayoshi Mase

Department of Computer Science and Engineering, Waseda University, Tokyo, Japan {ahayashi,yasutaka,shikano,kamiyama,watanabe,takeshi,mase}@kasahara.cs.waseda.ac.jp

## 1. Background

Heterogeneous multi-core architectures, which integrates multiple general purpose CPU cores and special purpose accelerator cores on a chip, have become widely spread. However, heterogeneous multi-cores require very difficult coding for load distribution to CPU cores and accelerator cores, synchronizations and data transfer using DMA controllers. To release application programmers from such painful work, powerful parallelizing compiler for heterogeneous multi-core architectures is expected. Furthermore, cooperative work between parallelizing compiler and heterogeneous multi-core architectures is important to fully exploit the potential from these systems. Considering above situations, this poster proposes OSCAR parallelizing compiler cooperative heterogeneous multi-core architecture.

#### 2. The Proposed Architecture

The architecture comprises the following compiler-aware features(Figure.1): (1)Local data memory and distributed shared memory. (2)Advanced DMA controller called DTU (Data Transfer Unit) which enables overlapping task execution and data transfer. (3)Directly connecting an accelerator and a CPU on the same core.



Figure 1. The proposed architecture

## 3. Performance Results

For the evaluations, a cycle accurate heterogeneous multicore architecture simulator is developed. As a CPU, a single issue in-order SPARC V9 pipeline is assumed. Also, as an accelerator, Hitachi FE-GA[1], which is dynamically reconfigurable processor, is assumed.

The performance of the architecture is evaluated using MP3 encoder program and AAC encoder program, which are parallelized by OSCAR compiler[2]. As a result, the proposed architecture gives us 21.4 times speedup with four general purpose PCs and four accelerator PCs for an MP3 audio encoder program versus sequential execution using a single CPU core. Moreover, the architecture gives us 10.0 times speedup when executing an AAC encoder program using the previous architecture.

## 4. Conclusions

In this poster, We present OSCAR compiler cooperative heterogeneous multi-core architecture. The proposed architecture, which is designed to supports OSCAR compiler, gives us good performance.

### Acknowledgments

This research was partly supported by "Heterogeneous multi-core" of the NEDO and by "Ambient Soc Global COE Program of Waseda University" of the Ministry of Education, Culture, Sports, Science and Technology, Japan. This work was supervised by Keiji Kimura and Hironori Kasahara at Waseda University.

### References

- T. Kodama et al. Flexible engine: A dynamic reconfigurable accelerator with high performance and low power consumption. Proc. of IEEE Symposium on Low-Power and High-Speed Chips, pages 393–408, April 2006.
- [2] Y. Wada et al. Parallelizing compiler cooperative heterogeneous multicore. In *Proceedings of Workshop on Software and Hardware Challenges of Manycore Platforms, SHCMP'08*, Jun. 2008.