The results presented in this table are from our submission to MLPerf Tiny's benchmark, as part of the round v1.1 results. The MLPerf Tiny v1.1 submission demonstrates the versatility of the fpgaConvNet toolflow by targeting a range of low-cost FPGAs, whilst achieving ultra low latency across these devices. The obtained high performance is due to the exploration of the reconfigurability feature of FPGAs, allowing the tool to create highly tailored accelerator designs for each specific task and device. fpgaConvNet showcases the potential of FPGA devices for TinyML applications, as performance similar to that of ASICs is achieved whilst having the programmability of MCUs.
Device | Task | Latency | LUT | DSP | BRAM | Freq. | |
---|---|---|---|---|---|---|---|
ZC706 | Image Classification | 0.15 ms | 108K | 564 | 281 | 187 MHz | link |
Visual Wake Word | 0.72 ms | 133K | 564 | 366 | 200 MHz | link | |
ZedBoard | Image Classification | 0.41 ms | 47K | 211 | 93 | 143 MHz | link |
Visual Wake Word | 9.49 ms | 34K | 189 | 123 | 111 MHz | link | |
Keyword Spotting | 0.32 ms | 37K | 188 | 97 | 143 MHz | link | |
ZyBo | Image Classification | 3.15 ms | 16K | 78 | 36 | 125 MHz | link |
Keyword Spotting | 2.15 ms | 15K | 60 | 16 | 125 MHz | link | |
Cora-Z7 | Keyword Spotting | 4.21 ms | 13K | 55 | 28 | 143 MHz | link |
Instructions on how to use the bitstream can be found in the MLPerf-Tiny repo.
The results presented in this table are from our recently published research papers.
Year | Task | Network | Accuracy | Latency | Throughput | Device | LUT | DSP | BRAM18k | URAM | Freq. | Cite |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2024 | CamVid | UNet | 71.75% | - | 16.96fps (2206GOP/s) | VCU1525 | 993k | 6019 | 3715 | 864 | 200MHz | link |
2024 | CamVid | UNet | 71.75% | - | 21.21fps (2758GOP/s) | U200 | 1040k | 6062 | 3654 | 864 | 250MHz | link |
2024 | CamVid | UNet | 71.75% | - | 1.28fps (166GOP/s) | ZCU102 | 213k | 1461 | 1368 | - | 200MHz | link |
2024 | COCO | YOLOv8n | 35.10% | - | 184.27fps (808GOP/s) | VCU118 | 543k | 5061 | 1813 | 431 | 250MHz | link |
2024 | UCF101 | X3D-M | 96.29% | - | 13.44fps (86GOP/s) | ZCU102 | 235k | 932 | 857 | - | 200MHz | link |
2024 | BraTS2020 | UNet3D | 85.34% | - | 1.75fps (1595GOP/s) | U200 | 289k | 5677 | 2980 | 528 | 250MHz | link |
2024 | ImageNet | ResNet18 | 70.3% | 27.0ms | - | ZC706 | 38k | 150 | 1709 | - | 200MHz | link |
2024 | ImageNet | ResNet18 | 70.5% | 7.0ms | - | ZCU102 | 127k | 1251 | 2318 | - | 200MHz | link |
2024 | ImageNet | ResNet18 | 70.0% | 1.3ms | - | U50 | 704k | 5817 | 2490 | 576 | 250MHz | link |
2024 | ImageNet | ResNet50 | 76.0% | 3.4ms | - | U50 | 867k | 3807 | 2698 | 640 | 250MHz | link |
2024 | ImageNet | ResNet50 | 76.0% | 1.8ms | - | U250 | 1714k | 7804 | 4025 | 967 | 250MHz | link |
2024 | ImageNet | MobileNetV2 | 65.6% | 4.8ms | - | ZC706 | 219k | 391 | 1084 | - | 200MHz | link |
2024 | ImageNet | MobileNetV2 | 65.7% | 2.3ms | - | ZCU102 | 273k | 1222 | 1428 | - | 200MHz | link |
2023 | COCO | YOLOv3-Tiny | 33.9% | 14.3ms | 418.9GOP/s | VCU110 | 127k | 1780 | 4181 | - | 220MHz | link |
2023 | COCO | YOLOv3-Tiny | 33.9% | 6.8ms | 875.7GOP/s | VCU118 | 431k | 6687 | 4296 | 90 | 255MHz | link |
2023 | COCO | YOLOv5s | 56.2% | 46.4ms | 392.0GOP/s | VCU110 | 602k | 1794 | 3776 | - | 200MHz | link |
2023 | COCO | YOLOv5s | 56.2% | 14.9ms | 1219.8GOP/s | VCU118 | 117k | 5077 | 4052 | 33 | 270MHz | link |
2023 | COCO | YOLOv8s | 61% | 122.8ms | 248.2GOP/s | VCU110 | 629k | 1767 | 5565 | - | 200MHz | link |
2023 | COCO | YOLOv8s | 61% | 24.5ms | 1244GOP/s | VCU118 | 1023k | 6815 | 1322 | 713 | 240MHz | link |