FPGAs Have the Wrong Abstraction - 12 minutes read
FPGAs Have the Wrong Abstraction
What is an FPGA?
I don’t think the architecture community has a consensus definition. Let’s entertain three possible answers:
Definition 1: An FPGA is a bunch of transistors that you can wire up to make any circuit you want. It’s like a breadboard at nanometer scale. Having an FPGA is like taping out a chip, but you only need to buy one chip to build lots of different designs—and you take an efficiency penalty in exchange.
I don’t like this answer. It’s neither literally true nor a solid metaphor for how people actually use FPGAs.
It’s not literally true because of course you don’t literally rewire an FPGA—it’s actually a 2D grid of lookup tables connected by a routing network, with some arithmetic units and memories thrown in for good measure. FPGAs do a pretty good job of faking arbitrary circuits, but they really are faking it, in the same way that a software circuit emulator fakes it.
The answer doesn’t work metaphorically because it oversimplifies the way people actually use FPGAs. The next two definitions will do a better job of describing what FPGAs are for.
Definition 2: An FPGA is a cheaper alternative to making a custom chip, for prototyping and low-volume production. If you’re building a router, you can avoid the immense cost of taping out a new chip for it and instead ship an off-the-shelf FPGA programmed with the functionality you need. Or if you’re designing a CPU, you can use an FPGA as a prototype: you can build a real, bootable system around it for testing and snazzy demos before you ship the design off to a fab.
Circuit emulation is the classic, mainstream use case for FPGAs, and it’s the reason they exist in the first place. The point of an FPGA is to take a hardware design, in the form of HDL code, and to buy cheap hardware that behaves the same as the ASIC you would eventually produce. You’re unlikely to take exactly the same Verilog code and make it work both on an FPGA and on real silicon, of course, but at least it’s in the same abstraction ballpark.
Definition 3: An FPGA is a pseudo-general-purpose computational accelerator. Like a GPGPU, an FPGA is good for offloading a certain kind of computation. It’s harder to program than a CPU, but for the right workload, it can be worth the effort: a good FPGA implementation can offer orders-of-magnitude performance and energy advantages over a CPU baseline.
This is a different use case from ASIC prototyping. Unlike circuit emulation, computational acceleration is an emerging use case for FPGAs. It’s behind the recent Microsoft successes accelerating search and deep neural networks. And critically, the computational use case doesn’t depend on FPGAs’ relationship to real ASICs: the Verilog code people write for FPGA-based acceleration need not bear any similarity to the kind of Verilog that would go into a proper tapeout.
These two use cases differ sharply in their implications for programming, compilers, and abstractions. I want to focus on the latter use case, which I’ll call computational FPGA programming. My thesis here is that the current approach to programming computational FPGAs, which borrows the traditional programming model from circuit emulation, is not the right thing. Verilog and VHDL are exactly the right thing if you want to prototype an ASIC. But we can and should rethink the entire stack when the goal is computation.
Let’s be ruthlessly literal. An FPGA is a special kind of hardware for efficiently executing a special kind of software that resembles a circuit description. An FPGA configuration is a funky kind of software, but it’s software, not hardware—it’s a program written for a strange ISA.
There’s a strong analogy here to GPUs. Before deep learning and before dogecoin, there was a time when GPUs were for graphics. In the early 2000s, people realized they could abuse a GPU as an accelerator for lots of computationally intensive kernels that had nothing to do with graphics: that GPU designers had built a more general kind of machine, for which 3D rendering was just one application.
Computational FPGAs are following the same trajectory. The idea is to abuse this funky hardware not for circuit emulation but to exploit computational patterns that are amenable to circuit-like execution. In the form of an SAT analogy:
To let GPUs blossom into the data-parallel accelerators they are today, people had to reframe the concept of what a GPU takes as input. We used to think of a GPU taking in an exotic, intensely domain specific description of a visual effect. We unlocked their true potential by realizing that GPUs execute programs. This realization let GPUs evolve from targeting a single application domain to targeting an entire computational domain. I think we’re in the midst of a similar transition with computational FPGAs:
The world hasn’t settled yet on a succinct description of the fundamental computational pattern that FPGAs are supposed to be good at. But it has something to do with potentially-irregular parallelism, data reuse, and mostly-static data flow. Like GPUs, FPGAs need a hardware abstraction that embodies this computational pattern:
What’s missing here is an ISA-like abstraction for the software that FPGAs run.
The problem with Verilog for computational FPGAs is that it neither does a good job as a low-level hardware abstraction nor as a high-level programming abstraction. By way of contradiction, let’s imagine what it would look like if RTL were playing each of these roles well.
Role 1: Verilog is an ergonomic high-level programming model that targets a lower-level abstraction. In this thought experiment, the ISA for computational FPGAs is something at a lower level of abstraction than RTL: netlists or bitstreams, for example. Verilog is the more productive, high-level programming model that we expose to humans.
Even RTL experts probably don’t believe that Verilog is a productive way to do mainstream FPGA development. It won’t propel programmable logic into the mainstream. RTL design may seem friendly and familiar to veteran hardware hackers, but the productivity gap with software languages is immeasurable.
Role 2: Verilog is a low-level abstraction for FPGA hardware resources. That is, Verilog is to an FPGA as an ISA is to a CPU. It may not be convenient to program in, but it’s a good target for compilers from higher-level languages because it directly describes what goes on in the hardware. And it’s the programming language of last resort for when you need to eke out the last few percentage points of performance.
And indeed, Verilog is the de facto ISA for today’s computational FPGAs. The major FPGA vendors’ toolchains take Verilog as input, and compilers from higher-level languages emit Verilog as their output. Vendors keep bitstream formats secret, so Verilog is as low in the abstraction hierarchy as you can go.
The problem with Verilog as an ISA is that it is too far removed from the hardware. The abstraction gap between RTL and FPGA hardware is enormous: it traditionally contains at least synthesis, technology mapping, and place & route—each of which is a complex, slow process. As a result, the compile/edit/run cycle for RTL programming on FPGAs takes hours or days and, worse still, it’s unpredictable: the deep stack of toolchain stages can obscure the way that changes in RTL will affect the design’s performance and energy characteristics.
A good ISA should directly expose unvarnished truth about the underlying hardware. Like an assembly language, it need not be convenient to program in. But also like assembly, it should be extremely fast to compile and yield predictable results. If there’s going to be a hope of building higher-level abstractions and compilers, they’re going to need such a low-level target that’s free of surprises. RTL is not that target.
I don’t know what abstraction should replace RTL for computational FPGAs. Practically, replacing Verilog may be impossible as long as the FPGA vendors keep their lower-level abstractions secret and their sub-RTL toolchains proprietary. The long-term resolution to this problem might only come when the hardware evolves, as GPUs once did:
If computational FPGAs are accelerators for a particular class of algorithmic patterns, there’s no reason to believe that today’s FPGAs are the ideal implementation of that goal. A new category of hardware that beats FPGAs at their own game could bring with it a fresh abstraction hierarchy. The new software stack should dispense with FPGAs’ circuit emulation legacy and, with it, their RTL abstraction.
Source: Cornell.edu
Powered by NewsAPI.org
Keywords:
Abstraction • Field-programmable gate array • Computer architecture • Field-programmable gate array • Transistor • Integrated circuit • Breadboard • Nanometre • Field-programmable gate array • Tape-out • Integrated circuit • Integrated circuit • Metaphor • Field-programmable gate array • Grid computing • Lookup table • Routing • Computer network • Arithmetic • Computer memory • Field-programmable gate array • Faking It (2014 TV series) • Emulator • Field-programmable gate array • Integrated circuit • Prototype • Router (computing) • Tape-out • Integrated circuit • Commercial off-the-shelf • Field-programmable gate array • Central processing unit • Field-programmable gate array • Prototype • Booting • System • Demo (computer programming) • Software design • Integrated circuit • Emulator • Use case • Field-programmable gate array • Processor design • Hardware description language • Machine code • Application-specific integrated circuit • Verilog • Field-programmable gate array • Silicon • Abstraction (software engineering) • Field-programmable gate array • Computer • Computation • Particle accelerator • General-purpose computing on graphics processing units • Field-programmable gate array • Quantum computing • Central processing unit • Field-programmable gate array • Order of magnitude • Computer performance • Central processing unit • Use case • Application-specific integrated circuit • Prototype • Integrated circuit • Emulator • Computational biology • Accelerometer • Use case • Field-programmable gate array • Microsoft • Deep learning • Computing • Use case • Correlation and dependence • Real number • Application-specific integrated circuit • Verilog • Machine code • Field-programmable gate array • Verilog • Subset • Tape-out • Computer programming • Compiler • Abstraction (software engineering) • Use case • Computational biology • Field-programmable gate array • Computational biology • Programming model • Integrated circuit • Emulator • Verilog • VHDL • Prototype-based programming • Application-specific integrated circuit • Stack (abstract data type) • Quantum computing • Field-programmable gate array • Computer hardware • Software • Field-programmable gate array • Software • Software • Computer hardware • Computer program • Graphics processing unit • Deep learning • Dogecoin • Graphics processing unit • Graphics processing unit • Graphics processing unit • Hardware acceleration • Computational chemistry • Kernel (operating system) • Computer graphics • Graphics processing unit • Web design • Machine • Rendering (computer graphics) • Application software • Computational biology • Field-programmable gate array • Computer hardware • Integrated circuit • Emulator • Computational biology • Pattern recognition • Integrated circuit • Analogy • Graphics processing unit • Data parallelism • Hardware acceleration • Graphics processing unit • Input/output • Graphics processing unit • Domain-specific language • Truth value • Graphics processing unit • Computer program • Graphics processing unit • Application domain • Computational biology • Computational biology • Computer • Field-programmable gate array • Parallel computing • Type system • Dataflow • Graphics processing unit • Field-programmable gate array • Hardware abstraction • Computational biology • Design pattern • Software • Field-programmable gate array • Verilog • Computational biology • Field-programmable gate array • Low-level programming language • Hardware abstraction • High-level programming language • Register-transfer level • Verilog • Human factors and ergonomics • Programming model • Abstraction • Thought experiment • Computer • Field-programmable gate array • Abstraction (software engineering) • Register-transfer level • Verilog • Register-transfer level • Verilog • Field-programmable gate array • New product development • Programmable logic device • Register-transfer level • Software design • Computer hardware • Security hacker • Productivity improving technologies • Software • Programming language • Verilog • Low-level programming language • Abstraction (software engineering) • Field-programmable gate array • Computer hardware • System resource • Verilog • Field-programmable gate array • Central processing unit • Computer program • Compiler • High-level programming language • Computer hardware • Programming language • Verilog • De facto • Computational biology • Field-programmable gate array • Toolchain • Verilog • Input/output • Compiler • High-level programming language • Verilog • Input/output • Bitstream • Verilog • Abstraction (software engineering) • Memory hierarchy • Verilog • Computer hardware • Abstraction (software engineering) • Register-transfer level • Field-programmable gate array • Computer hardware • Logic synthesis • Function (mathematics) • Process (computing) • Compiler • Execution (computing) • Instruction cycle • Register-transfer level • Computer programming • Field-programmable gate array • Deep learning • Stack (abstract data type) • Toolchain • Register-transfer level • Software design • Computer performance • The Unvarnished Truth • Computer hardware • Assembly language • Computer program • Compiler • High-level programming language • Abstraction • Compiler • Register-transfer level • Abstraction (software engineering) • Register-transfer level • Verilog • Field-programmable gate array • Abstraction (software engineering) • Register-transfer level • Toolchain • Proprietary software • Sensor • Computer hardware • Graphics processing unit • Computational biology • Field-programmable gate array • Algorithm • Pattern recognition • Field-programmable gate array • Implementation • Computer hardware • Field-programmable gate array • Abstraction • Field-programmable gate array • Integrated circuit • Emulator • Register-transfer level • Abstraction •
What is an FPGA?
I don’t think the architecture community has a consensus definition. Let’s entertain three possible answers:
Definition 1: An FPGA is a bunch of transistors that you can wire up to make any circuit you want. It’s like a breadboard at nanometer scale. Having an FPGA is like taping out a chip, but you only need to buy one chip to build lots of different designs—and you take an efficiency penalty in exchange.
I don’t like this answer. It’s neither literally true nor a solid metaphor for how people actually use FPGAs.
It’s not literally true because of course you don’t literally rewire an FPGA—it’s actually a 2D grid of lookup tables connected by a routing network, with some arithmetic units and memories thrown in for good measure. FPGAs do a pretty good job of faking arbitrary circuits, but they really are faking it, in the same way that a software circuit emulator fakes it.
The answer doesn’t work metaphorically because it oversimplifies the way people actually use FPGAs. The next two definitions will do a better job of describing what FPGAs are for.
Definition 2: An FPGA is a cheaper alternative to making a custom chip, for prototyping and low-volume production. If you’re building a router, you can avoid the immense cost of taping out a new chip for it and instead ship an off-the-shelf FPGA programmed with the functionality you need. Or if you’re designing a CPU, you can use an FPGA as a prototype: you can build a real, bootable system around it for testing and snazzy demos before you ship the design off to a fab.
Circuit emulation is the classic, mainstream use case for FPGAs, and it’s the reason they exist in the first place. The point of an FPGA is to take a hardware design, in the form of HDL code, and to buy cheap hardware that behaves the same as the ASIC you would eventually produce. You’re unlikely to take exactly the same Verilog code and make it work both on an FPGA and on real silicon, of course, but at least it’s in the same abstraction ballpark.
Definition 3: An FPGA is a pseudo-general-purpose computational accelerator. Like a GPGPU, an FPGA is good for offloading a certain kind of computation. It’s harder to program than a CPU, but for the right workload, it can be worth the effort: a good FPGA implementation can offer orders-of-magnitude performance and energy advantages over a CPU baseline.
This is a different use case from ASIC prototyping. Unlike circuit emulation, computational acceleration is an emerging use case for FPGAs. It’s behind the recent Microsoft successes accelerating search and deep neural networks. And critically, the computational use case doesn’t depend on FPGAs’ relationship to real ASICs: the Verilog code people write for FPGA-based acceleration need not bear any similarity to the kind of Verilog that would go into a proper tapeout.
These two use cases differ sharply in their implications for programming, compilers, and abstractions. I want to focus on the latter use case, which I’ll call computational FPGA programming. My thesis here is that the current approach to programming computational FPGAs, which borrows the traditional programming model from circuit emulation, is not the right thing. Verilog and VHDL are exactly the right thing if you want to prototype an ASIC. But we can and should rethink the entire stack when the goal is computation.
Let’s be ruthlessly literal. An FPGA is a special kind of hardware for efficiently executing a special kind of software that resembles a circuit description. An FPGA configuration is a funky kind of software, but it’s software, not hardware—it’s a program written for a strange ISA.
There’s a strong analogy here to GPUs. Before deep learning and before dogecoin, there was a time when GPUs were for graphics. In the early 2000s, people realized they could abuse a GPU as an accelerator for lots of computationally intensive kernels that had nothing to do with graphics: that GPU designers had built a more general kind of machine, for which 3D rendering was just one application.
Computational FPGAs are following the same trajectory. The idea is to abuse this funky hardware not for circuit emulation but to exploit computational patterns that are amenable to circuit-like execution. In the form of an SAT analogy:
To let GPUs blossom into the data-parallel accelerators they are today, people had to reframe the concept of what a GPU takes as input. We used to think of a GPU taking in an exotic, intensely domain specific description of a visual effect. We unlocked their true potential by realizing that GPUs execute programs. This realization let GPUs evolve from targeting a single application domain to targeting an entire computational domain. I think we’re in the midst of a similar transition with computational FPGAs:
The world hasn’t settled yet on a succinct description of the fundamental computational pattern that FPGAs are supposed to be good at. But it has something to do with potentially-irregular parallelism, data reuse, and mostly-static data flow. Like GPUs, FPGAs need a hardware abstraction that embodies this computational pattern:
What’s missing here is an ISA-like abstraction for the software that FPGAs run.
The problem with Verilog for computational FPGAs is that it neither does a good job as a low-level hardware abstraction nor as a high-level programming abstraction. By way of contradiction, let’s imagine what it would look like if RTL were playing each of these roles well.
Role 1: Verilog is an ergonomic high-level programming model that targets a lower-level abstraction. In this thought experiment, the ISA for computational FPGAs is something at a lower level of abstraction than RTL: netlists or bitstreams, for example. Verilog is the more productive, high-level programming model that we expose to humans.
Even RTL experts probably don’t believe that Verilog is a productive way to do mainstream FPGA development. It won’t propel programmable logic into the mainstream. RTL design may seem friendly and familiar to veteran hardware hackers, but the productivity gap with software languages is immeasurable.
Role 2: Verilog is a low-level abstraction for FPGA hardware resources. That is, Verilog is to an FPGA as an ISA is to a CPU. It may not be convenient to program in, but it’s a good target for compilers from higher-level languages because it directly describes what goes on in the hardware. And it’s the programming language of last resort for when you need to eke out the last few percentage points of performance.
And indeed, Verilog is the de facto ISA for today’s computational FPGAs. The major FPGA vendors’ toolchains take Verilog as input, and compilers from higher-level languages emit Verilog as their output. Vendors keep bitstream formats secret, so Verilog is as low in the abstraction hierarchy as you can go.
The problem with Verilog as an ISA is that it is too far removed from the hardware. The abstraction gap between RTL and FPGA hardware is enormous: it traditionally contains at least synthesis, technology mapping, and place & route—each of which is a complex, slow process. As a result, the compile/edit/run cycle for RTL programming on FPGAs takes hours or days and, worse still, it’s unpredictable: the deep stack of toolchain stages can obscure the way that changes in RTL will affect the design’s performance and energy characteristics.
A good ISA should directly expose unvarnished truth about the underlying hardware. Like an assembly language, it need not be convenient to program in. But also like assembly, it should be extremely fast to compile and yield predictable results. If there’s going to be a hope of building higher-level abstractions and compilers, they’re going to need such a low-level target that’s free of surprises. RTL is not that target.
I don’t know what abstraction should replace RTL for computational FPGAs. Practically, replacing Verilog may be impossible as long as the FPGA vendors keep their lower-level abstractions secret and their sub-RTL toolchains proprietary. The long-term resolution to this problem might only come when the hardware evolves, as GPUs once did:
If computational FPGAs are accelerators for a particular class of algorithmic patterns, there’s no reason to believe that today’s FPGAs are the ideal implementation of that goal. A new category of hardware that beats FPGAs at their own game could bring with it a fresh abstraction hierarchy. The new software stack should dispense with FPGAs’ circuit emulation legacy and, with it, their RTL abstraction.
Source: Cornell.edu
Powered by NewsAPI.org
Keywords:
Abstraction • Field-programmable gate array • Computer architecture • Field-programmable gate array • Transistor • Integrated circuit • Breadboard • Nanometre • Field-programmable gate array • Tape-out • Integrated circuit • Integrated circuit • Metaphor • Field-programmable gate array • Grid computing • Lookup table • Routing • Computer network • Arithmetic • Computer memory • Field-programmable gate array • Faking It (2014 TV series) • Emulator • Field-programmable gate array • Integrated circuit • Prototype • Router (computing) • Tape-out • Integrated circuit • Commercial off-the-shelf • Field-programmable gate array • Central processing unit • Field-programmable gate array • Prototype • Booting • System • Demo (computer programming) • Software design • Integrated circuit • Emulator • Use case • Field-programmable gate array • Processor design • Hardware description language • Machine code • Application-specific integrated circuit • Verilog • Field-programmable gate array • Silicon • Abstraction (software engineering) • Field-programmable gate array • Computer • Computation • Particle accelerator • General-purpose computing on graphics processing units • Field-programmable gate array • Quantum computing • Central processing unit • Field-programmable gate array • Order of magnitude • Computer performance • Central processing unit • Use case • Application-specific integrated circuit • Prototype • Integrated circuit • Emulator • Computational biology • Accelerometer • Use case • Field-programmable gate array • Microsoft • Deep learning • Computing • Use case • Correlation and dependence • Real number • Application-specific integrated circuit • Verilog • Machine code • Field-programmable gate array • Verilog • Subset • Tape-out • Computer programming • Compiler • Abstraction (software engineering) • Use case • Computational biology • Field-programmable gate array • Computational biology • Programming model • Integrated circuit • Emulator • Verilog • VHDL • Prototype-based programming • Application-specific integrated circuit • Stack (abstract data type) • Quantum computing • Field-programmable gate array • Computer hardware • Software • Field-programmable gate array • Software • Software • Computer hardware • Computer program • Graphics processing unit • Deep learning • Dogecoin • Graphics processing unit • Graphics processing unit • Graphics processing unit • Hardware acceleration • Computational chemistry • Kernel (operating system) • Computer graphics • Graphics processing unit • Web design • Machine • Rendering (computer graphics) • Application software • Computational biology • Field-programmable gate array • Computer hardware • Integrated circuit • Emulator • Computational biology • Pattern recognition • Integrated circuit • Analogy • Graphics processing unit • Data parallelism • Hardware acceleration • Graphics processing unit • Input/output • Graphics processing unit • Domain-specific language • Truth value • Graphics processing unit • Computer program • Graphics processing unit • Application domain • Computational biology • Computational biology • Computer • Field-programmable gate array • Parallel computing • Type system • Dataflow • Graphics processing unit • Field-programmable gate array • Hardware abstraction • Computational biology • Design pattern • Software • Field-programmable gate array • Verilog • Computational biology • Field-programmable gate array • Low-level programming language • Hardware abstraction • High-level programming language • Register-transfer level • Verilog • Human factors and ergonomics • Programming model • Abstraction • Thought experiment • Computer • Field-programmable gate array • Abstraction (software engineering) • Register-transfer level • Verilog • Register-transfer level • Verilog • Field-programmable gate array • New product development • Programmable logic device • Register-transfer level • Software design • Computer hardware • Security hacker • Productivity improving technologies • Software • Programming language • Verilog • Low-level programming language • Abstraction (software engineering) • Field-programmable gate array • Computer hardware • System resource • Verilog • Field-programmable gate array • Central processing unit • Computer program • Compiler • High-level programming language • Computer hardware • Programming language • Verilog • De facto • Computational biology • Field-programmable gate array • Toolchain • Verilog • Input/output • Compiler • High-level programming language • Verilog • Input/output • Bitstream • Verilog • Abstraction (software engineering) • Memory hierarchy • Verilog • Computer hardware • Abstraction (software engineering) • Register-transfer level • Field-programmable gate array • Computer hardware • Logic synthesis • Function (mathematics) • Process (computing) • Compiler • Execution (computing) • Instruction cycle • Register-transfer level • Computer programming • Field-programmable gate array • Deep learning • Stack (abstract data type) • Toolchain • Register-transfer level • Software design • Computer performance • The Unvarnished Truth • Computer hardware • Assembly language • Computer program • Compiler • High-level programming language • Abstraction • Compiler • Register-transfer level • Abstraction (software engineering) • Register-transfer level • Verilog • Field-programmable gate array • Abstraction (software engineering) • Register-transfer level • Toolchain • Proprietary software • Sensor • Computer hardware • Graphics processing unit • Computational biology • Field-programmable gate array • Algorithm • Pattern recognition • Field-programmable gate array • Implementation • Computer hardware • Field-programmable gate array • Abstraction • Field-programmable gate array • Integrated circuit • Emulator • Register-transfer level • Abstraction •