Report on the First Workshop on Hybrid Multicore Computing in conjunction with HiPC 2010 @ Goa, India
The workshop was held on Dec 19, 2010 on the sidelines of the High Performance Computing (HiPC) conference. Todays computers have multiple complex cores in its CPU, many simpler cores in a GPU, and may have other accelerators attached to it. Achieving the best performance -- in terms of speed, latency, energy efficiency -- from the system by coordinating all available sources of computation should be the goal of algorithms and applications of tomorrow. That is the focus of this workshop.
The half-day workshop had a keynote talk by Peter Hofstee of IBM, the chief architect of the CellBE processor, titled "Heterogeneous Processors: The Cell Broadband Engine". Peter elaborated on the choice of embedding distinct accelerators in the die to effectively utilize the additional silicon real estate, as frequency and power walls forced the industry to change the traditional path. Power software tools are absolutely essential in making such processors to be useful to a wide range of users. He predicted that additional transistors being made available by Moore's law will go into task-specific accelerators or even special ASICs. Additional transistors may even migrate to data centres to form large, heterogeneous compute clouds that customers will access through lower-end devices.
Viktor Prasanna of University of Southern California gave an invited talk on "FPGA Accelerators for High Throughput Applications". Viktor traced the development of reconfigurability from early days and explained how today's FPGAs can provide high, reconfigurable, compute power. He explained their work on packet routing where an SRAM+FPGA solution becomes highly viable compared to the heavy power requirements of the traditional TCAMs. He also outlined an approach to regular expression evaluation for deep packet inspection on the FPGA.
Dinesh Manocha of University of North Carolina gave an invited talk on "GPU (Many-Core) Computing for Mass-Market Applications". Dinesh traced the development and exploitation of GPU as a computing engine for applications like FFT and sorting. He outlined some of his group's efforts at using GPUs for robotics, crowd simulation and an incredibly computationally hard problem like sound rendering. He also stressed on the difficulty of performance tuning on the GPUs today.
Jacob Barhen of ORNL presented their contributed paper on FFT-Based Spatio-Temporal Noise Matrix Inversion on a hybrid processor system.
Lively discussions with the audience enhanced the workshop greatly. The main concern was on the programmability of hybrid, multicore systems. Suitable programming models and tools are lacking on them, especially to extract good performance from the hardware. Compilers, debuggers, emulators, and other tools which help application developers achieve high performance on these platforms are absolutely essential for their wide adaptation.
Please find below the program and the organizing committe details.
8:30 am: Welcome Address
8:40 am: Keynote talk by Peter Hofstee (IBM)
Title: Heterogeneous Processors: The Cell Broadband Engine
Abstract: This talk will provide a review of four years of experience with the Cell Broadband Engine. This talk will review the original motivation for introducing the architecture, discuss the various processor and system implementations, and highlight key application areas. The second part of the talk will discuss hybrid and heterogeneous system architecture in general and takes a stand on how to program such systems.
Biography: H. Peter Hofstee currently works at the IBM Austin Research Laboratory on workload-optimized and hybrid systems. Peter has degrees in theoretical physics (MS, Rijks Universiteit Groningen, Netherlands) and computer science (PhD, California Inst. of Technology). At IBM Peter has worked on microprocessors, including the first CMOS processor to demonstrate GHz operation (1997), and he was the chief architect of the synergistic processor elements in the Cell Broadband Engine, known from its use in the Sony Playstation 3 and the Roadrunner supercomputer that first broke the 1 Petaflop Linpack benchmark. His interests include VLSI, multicore and heterogeneous microprocessor architecture, security, system design and programming. Peter has over 100 patents issued or pending.
9:40 am: Jacob Barhen, Travis Humble, Pramita Mitra, Charlotte Kotas, Neena Imam and Bryan Schleck. FFT-Based Spatio-Temporal Noise Covariance Matrix Inversion on Hybrid Multicore Processor Systems
10:20 am : Break
10:40 am: Invited Talk by Viktor K. Prasanna, University of Southern California
Title: FPGA Accelerators for High Throughput Applications (slides)
Abstract: Reconfigurable devices and systems have evolved dramatically over the past decade. Recently, several state-of the-art high end platforms have incorporated FPGAs (Field Programmable Gate Arrays) for application acceleration including high end routers. This talk explores architectures and algorithms for accelerating core network functions including deep packet inspection and packet classification in Internet routers. We illustrate the performance improvements for such systems and demonstrate the suitability of FPGAs for these computations. We also propose energy efficient designs to realize the “Green Internet” vision. We show that SRAM based solutions combined with FPGA based architectures lead to high throughput as well as reduced power dissipation compared with the state of the art solutions based TCAMs. We conclude by highlighting the challenges in further exploiting this technology for such applications.
Biography: Viktor K. Prasanna (ceng.usc.edu/~prasanna) is Charles Lee Powell Chair in Engineering in the Ming Hsieh Department of Electrical Engineering and Professor of Computer Science at the University of Southern California. He is the executive director of the USC-Infosys Center for Advanced Software Technologies (CAST) and director of the Center for Energy Informatics. He is the associate director of the USC-Chevron Center of Excellence for Research and Academic Training on Interactive Smart Oilfield Technologies. His research interests include parallel and distributed systems including networked sensor systems, embedded systems, configurable architectures and high performance computing. He served on the editorial boards of the Journal of Parallel and Distributed Computing, Proceedings of the IEEE, IEEE Transactions on VLSI Systems, and IEEE Transactions on Parallel and Distributed Systems. He served as the Editor-in- Chief of the IEEE Transactions on Computers during 2003-06. Prasanna was the founding Chair of the IEEE Computer Society Technical Committee on Parallel Processing. He is the steering chair of the IEEE International Conference on High Performance Computing (www.hipc.org). He is a Fellow of the IEEE, the ACM and AAAS. He is a recipient of 2009 Outstanding Engineering Alumnus Award from the Pennsylvania State University.
11:20 am: Invited Talk by Dinesh Manocha, Department of CS, University of North Carolina at Chapel Hill
Title: GPU (Many-Core) Computing for Mass-Market Applications (slides)
Abstract: For years the performance and functionality of graphics processors (GPUs) has been increasing at a faster pace than Moore's Law. The latest GPUs consist of more than 3 billion transistors and can offer a peak performance of few TFlops. They consist of 500+ stream processors, offer high memory bandwidth and have a different programming model as compared to the CPUs. In this talk, we will give an overview of our work in GPUs as many-core accelerators for mass-market applications, including Physics-based simulation, database computations, sorting and geometric algorithms. This includes development of new methods that exploit the architectural characteristics of GPUs, explicit balancing of work units with very lightweight synchronization between the cores, and using them as many-core stream processors with appropriate stream compaction. We will also highlight their benefit for interactive sound rendering and use them for real-time planning and navigation of physical robots.
Biography: Dinesh Manocha is currently a Phi Delta Theta/Mason Distinguished Professor of Computer Science at the University of North Carolina at Chapel Hill. He received his Ph.D. in Computer Science at the University of California at Berkeley 1992. He has published more than 300 papers in computer graphics, geometric computation, robotics and many-core computing and received 12 best-paper awards. Some of the software systems developed by his group on collision and geometric computations, interactive rendering, and GPU-based algorithms have been downloaded by more than 100K users and widely licensed by commercial vendors. Manocha has served in the program committees of more than 100 leading conferences and in the editorial board of more than 10 leading journals. He is an ACM Fellow.
12:00 pm: Discussions
12:45 pm: Concluding Remarks
The workshop is jointly organized by IIIT Hyderabad and IBM, India.
- P. J. Narayanan, IIIT Hyderabad, India.
- R. Govindarajulu, IIIT Hyderabad, India.
- Manish Gupta, IBM IRL, India.
- Suresh Purayat, IBM Bangalore, India.
Current Program Committee
- Kishore Kothapalli, IIIT Hyderabad, India (co-chair)
- Suresh Purini, IIIT Hyderabad, India (co-chair)
- Mainak Chaudhari, IIT Kanpur, India
- Arun Chauhan, Indiana University, USA
- Jatin Chhugani, Intel USA
- Pradeep Dubey, Intel, USA
- Manjunath Kudlur, Nvidia, USA
- Subodh Kumar, IIT Delhi, India
- Kamesh Madduri, Lawrence Berkeley National Laboratory, USA
- Krishna Nandivada, IBM IRL, India
- P. J. Narayanan, IIIT Hyderabad, India
- Yogish Sabharwal, IBM IRL, India