Auto-generating diverse heterogeneous designs
File(s)RAW_2024_metaprog.pdf (638.23 KB)
Accepted version
Author(s)
Vandebon, Jessica
De Figueiredo Coutinho, Jose
Luk, Wayne
Type
Conference Paper
Abstract
This paper presents a novel architecture for end-
to-end design automation, facilitating high-level design portability across diverse technologies. We introduce programmatic, customizable and reusable design-flows capable of generating multiple implementations (e.g., CPU, GPU, FPGA) from a single technology-agnostic high-level application source. Notably, our approach incorporates design-flow branch points and automated
path selection strategies, mitigating the manual effort currently needed for efficient design production, particularly for heterogeneous platforms. To validate our approach, we implement optimizing design-flows tailored to different hardware platforms. Through experiments on five AI and HPC benchmarks, we demonstrate significant speed improvements compared to single-threaded CPU execution. Our approach generates multi-thread CPU, CPU+FPGA, and CPU+GPU designs from a single high-level source description, achieving speedups of up to 30 times for OpenMP multi-thread CPU, 32 times for oneAPI CPU+FPGA,
and 779 times for HIP CPU+GPU designs. We also showcase cost-effective implementations targeting heterogeneous computing platforms. Additionally, these performance advancements are accompanied by gains in developer productivity, quantified based on added lines of code.
to-end design automation, facilitating high-level design portability across diverse technologies. We introduce programmatic, customizable and reusable design-flows capable of generating multiple implementations (e.g., CPU, GPU, FPGA) from a single technology-agnostic high-level application source. Notably, our approach incorporates design-flow branch points and automated
path selection strategies, mitigating the manual effort currently needed for efficient design production, particularly for heterogeneous platforms. To validate our approach, we implement optimizing design-flows tailored to different hardware platforms. Through experiments on five AI and HPC benchmarks, we demonstrate significant speed improvements compared to single-threaded CPU execution. Our approach generates multi-thread CPU, CPU+FPGA, and CPU+GPU designs from a single high-level source description, achieving speedups of up to 30 times for OpenMP multi-thread CPU, 32 times for oneAPI CPU+FPGA,
and 779 times for HIP CPU+GPU designs. We also showcase cost-effective implementations targeting heterogeneous computing platforms. Additionally, these performance advancements are accompanied by gains in developer productivity, quantified based on added lines of code.
Date Issued
2024
Date Acceptance
2024-03-21
Citation
2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), 2024, pp.116-123
ISBN
979-8-3503-6460-6
Publisher
IEEE
Start Page
116
End Page
123
Journal / Book Title
2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Copyright Statement
Copyright © 2024 IEEE. This is the author’s accepted manuscript made available under a CC-BY licence in accordance with Imperial’s Research Publications Open Access policy (www.imperial.ac.uk/oa-policy)
License URL
Identifier
https://www.computer.org/csdl/proceedings-article/ipdpsw/2024/646000a116/1YTsb7Dveik
Source
31st Reconfigurable Architectures Workshop (IEEE IPDPS 2024)
Publication Status
Published
Start Date
2024-05-27
Finish Date
2024-05-28
Coverage Spatial
San Francisco, California, USA
Date Publish Online
2024-05-27