Implementing the Generator of DCGAN on FPGA
Yu, Yifan (2018)
Yu, Yifan
Metropolia Ammattikorkeakoulu
2018
All rights reserved
Julkaisun pysyvä osoite on
https://urn.fi/URN:NBN:fi:amk-201805219366
https://urn.fi/URN:NBN:fi:amk-201805219366
Tiivistelmä
The parallel nature of FPGA makes it a promising candidate to accelerate machine learning tasks. The purpose of this project was to study the acceleration capabilities of FPGA for deep convolutional neural networks.
The project was carried out by implementing a generative model on the Nexys 4 trainer board with an Artix-7 FPGA from Xilinx. The pre-trained model is part of the popular Generative Adversarial Networks (GANs) which can create realistic images that resemble the training data. The core was written in Verilog, but several Xilinx IPs were also used to facilitate the design. Xilinx Vivado 2017.4 was used as the development platform. Both fixed-point and floating-point arithmetics were used to achieve a balance between efficiency and accuracy.
With simplicity as the main goal of the design, some optimizations were deliberately avoided. This paper serves as a detailed documentation of the design and implementation process. Transposed convolution is described which is the core operation of the generative model. A method to map network weights and biases from high precision floating-point representation to low precision integral representation, known as quantization, is derived. The quantization scheme leads to an efficient implementation of the General Matrix Multiplication (GEMM) operation, which is at the heart of neural network computations. As a conclusion, possible optimization methods are discussed as future work.
The project was carried out by implementing a generative model on the Nexys 4 trainer board with an Artix-7 FPGA from Xilinx. The pre-trained model is part of the popular Generative Adversarial Networks (GANs) which can create realistic images that resemble the training data. The core was written in Verilog, but several Xilinx IPs were also used to facilitate the design. Xilinx Vivado 2017.4 was used as the development platform. Both fixed-point and floating-point arithmetics were used to achieve a balance between efficiency and accuracy.
With simplicity as the main goal of the design, some optimizations were deliberately avoided. This paper serves as a detailed documentation of the design and implementation process. Transposed convolution is described which is the core operation of the generative model. A method to map network weights and biases from high precision floating-point representation to low precision integral representation, known as quantization, is derived. The quantization scheme leads to an efficient implementation of the General Matrix Multiplication (GEMM) operation, which is at the heart of neural network computations. As a conclusion, possible optimization methods are discussed as future work.