(Developed by the Intelligent Digital Systems Lab @ Imperial College London)

As machine learning has become a fundamental workload in today's digital systems, there has been increasing demand for highly efficient hardware to run these workloads. The work done at the Intelligent Digital Systems Lab at Imperial College London aims to address this demand with fpgaConvNet: a toolflow for designing Convolutional Neural Network (CNN) accelerators for FPGAs with state-of-the-art performance and efficiency. FPGA devices have long been considered a highly performant and efficient platform, where a designer can exploit the highly configurable fine-grain building blocks to produce customised hardware. However, many current FPGA designs for AI applications do not exploit these features, opting for a monolithic accelerator approach with very little specialisation towards the particular workload. With the fpgaConvNet toolflow, accelerator designs are customised to a specific CNN workload, mapping each operation in the CNN model to a dedicated hardware block. This leads to a deeply pipelined design with extremely high throughput and ultra low latency. The toolflow can be used to accelerate a number of applications, such as the ones listed below.

  • Image Classification
  • Object Detection
  • Human Action Recognition
  • Image Segmentation
  • Pose Estimation
  • Key Word Spotting
  • Anomaly Detection

The fpgaConvNet framework operates on an intermediate representation that describes the hardware of a mapped ML model. This component of the framework takes onnx files and creates the fpgaconvnet-ir used to generate hardware. Furthermore, the IR can also be used to generate high-level performance and resource models of the hardware. This IR is fundamental for rapid design space exploration.

fpgaconvnet model diagram
fpgaconvnet hls diagram

Using a configuration generated by the fpgaconvnet-model tool, the corresponding hardware can be generated by the fpgaconvnet-hls tool. This repository contains of selection of highly parametrised hardware building blocks for common CNN layers (Convolution, Pooling, ReLU, etc). A dataflow architecture is generated by instantiating and connecting these building blocks together based on the configuration file.

The design space of streaming architectures are immense, with even seemingly small networks such as LeNet having 1013 possible design points, taking 89 centuries to evaluate every single one. The fpgaConvNet toolflow allows designers to automate the design point selection process, using optimisation solvers tailored to the problem. SAMO is a framework that generalises the optimisation problem across streaming architectures, providing a toolflow for both FINN and HLS4ML. the fpgaconvnet-optimiser project is specialised to the fpgaConvNet architecture, finding even better design points.

fpgaconvnet hls diagram