Since the initial conception of the fpgaConvNet toolflow, there have been many research projects which have contributed to and utilised the framework. The core development of the toolflow has lead to innovations in Streaming Architecture design as well as Design Space Exploration. This in turn has enabled novel contributions in several application domains, broadening the scope of the toolflow. The growing list of research outcomes are listed below.
Petros Toupas, Zhewen Yu, Christos-Savvas Bouganis and Dimitrios Tzovaras
Partially offloading both weights and activations to off-chip memory without penalising the computation pipeline.
Zhewen Yu and Christos-Savvas Bouganis
A memory management methodology for balancing the allocation of weights both on-chip and off-chip.
Alexander Montgomerie-Corcoran, Petros Toupas, Zhewen Yu, and Christos-Savvas Bouganis
Design space optimisations and hardware support for state-of-the-art object detection models.
Zhewen Yu, and Christos-Savvas Bouganis
Tensor Decomposition methods for approximating CNN parameters and improving accelerator performance.
Petros Toupas, Christos-Savvas Bouganis, Dimitrios Tzovaras
Optimising and mapping state-of-the-art 3D-CNN models for human action recognition (HAR) while aiming for throughput-oriented designs.
Alexander Montgomerie-Corcoran*, Zhewen Yu*, Jianyi Cheng, Christos-Savvas Bouganis
this work addresses the challenges associated with exploiting post-activation sparsity for performance gains in streaming CNN accelerators.
Petros Toupas, Christos-Savvas Bouganis, Dimitrios Tzovaras
This study bridges the gap between recent developments in computer vision on videos and their deployment and applications on FPGAs by optimising and mapping a state-of-the-art 3D-CNN model (X3D) for human action recognition (HAR).
Benjamin Biggs, Christos-Savvas Bouganis, George A. Constantinides
ATHEENA looks into using early exit networks with fpgaConvNet, and the challenges associated with handling early exit branches.
Petros Toupas*, Alexander Montgomerie-Corcoran*, Christos-Savvas Bouganis, Dimitrios Tzovaras
A new architecture based off of the design principles of fpgaConvNet is developed in this work, but has the benefit of runtime parameter reconfiguration. This makes it suitable for low-latency applications as it avoids the need to do full bitstream reconfiguration, which is necessary for Human Action Recognition (HAR).
Alexander Montgomerie-Corcoran*, Zhewen Yu* and Christos-Savvas Bouganis
This work builds on the capabilities of fpgaConvNet's design space exploration, by proposing two optimisers (simulated annealing and rule-based), as well as expanding the scope to other FPGA-based ML accelerator frameworks, namely FINN and HLS4ML.
Zhewen Yu and Christos-Savvas Bouganis
This work looks into methods of compressing ML models by using Singular Value Decomposition (SVD) to reduce the number of weight parameters required. fpgaConvNet was used to show how this method benefits ML accelerators.
Alexander Montgomerie-Corcoran and Christos-Savvas Bouganis
In this paper, a novel coding scheme is introduced which exploits the properties of feature maps to reduce the activity along off-chip memory data busses, leading to lower system power consumption.
Alexander Montgomerie-Corcoran and Christos-Savvas Bouganis
This work explores the power consumption aspect of fpgaConvNet, producing high-level models of how design choices affect the power consumption of the accelerator.
Stylianos I. Venieris and Christos-Savvas Bouganis
The scope of fpgaConvNet is expanded to support more irregular network styles such as GoogleNet and ResNet, by including element-wise and concatenation layers in the design. This work addresses the design space exploration aspect of including multiple branches in the model.
Stylianos I. Venieris and Christos-Savvas Bouganis
This work explores methods of targeting latency for fpgaConvNet by constraining the design to a single bitstream, and making use of runtime weight parameter reconfiguration.
Stylianos I. Venieris and Christos-Savvas Bouganis
The original paper detailing the design principles of fpgaConvNet, such as the application of Synchronous Dataflow (SDF) performance modelling for streaming architectures, and high-level resource estimation. This work also explores the use of bitstream reconfiguration for high-throughput applications.