‎"Behind every stack of books there is a flood of knowledge."

Hardware Software Co-Design, Partitioning and Virtual Platform Resources


Hardware Software Co-Design Resources

Hardware Software Partitioning Resources

High-Level Synthesis Resources

Case Studies



GEZEL Documentation

GEZEL Structure

A Practical Introduction to Hardware/Software Codesign
2nd Edition

Patrick Schaumont
2nd Edition, 2012, xxii+480p
ISBN: 978-1-4614-3736-9
December 2012


This textbook provides an introduction to embedded systems design, with emphasis on integration of custom hardware components with software. The key problem addressed in the book is the following: “How can an embedded systems designer strike a balance between flexibility and efficiency?” The book describes how combining hardware design with software design leads to a solution to this important computer engineering problem. The book covers four topics in hardware/software codesign: Fundamentals, the Design Space of Custom Architectures, the Hardware/software Interface and Application Examples. The book comes with an associated design environment that helps the reader to perform experiments in hardware/software codesign. Each chapter also includes exercises and further reading suggestions.

Table of Contents [PDF]

  • Preface
  • Part I: Basic Concepts
  • Part II: The Design Space of Custom Architectures (Dedicated to Flexible)
    • Chapter 5: Finite State Machine with Datapath
    • Chapter 6: Microprogrammed Architectures
    • Chapter 7: General-Purpose Embedded Cores
    • Chapter 8: System-on-Chip
  • Part III: Hardware-Software Interfaces
    • Chapter 9: Principles of Hardware/Software Communication
    • Chapter 10: On-Chip Busses
    • Chapter 11: Microprocessor Interfaces
    • Chapter 12: Hardware Interfaces
  • Part IV: Applications
    • Chapter 13: Trivium Crypto-Coprocessor
    • Chapter 14: AES Co-processor
    • Chapter 15: Cordic Co-processor
  • Appendix: Hands-on Experiments in GEZEL

Instructor Resources

[PPT] P. Schaumont, “GEZEL Modeling and Simulation
[PPT] P. Schaumont, “GEZEL Ipblock Design
[PDF] P. Schaumont, “A Senior Level Course in Hardware/Software Codesign,” IEEE Transactions on Education, Special Issue on Micro-Electronic Systems Education, 51(3):306-311, August 2008.
[PDF] P. Schaumont, “Hardware/Software Codesign is a Starting Point in Embedded Systems Architecture Education“,ARTIST Workshop on Embedded Systems Education (WESE 2008), October 2008.
[PDF] P. Schaumont, “A Senior-Level Course in Hardware-Software Codesign,” Micro-electronic System Education Conference (MSE), June 2007.
Design Methodologies
[PDF] S. Iyer, J. Zhang, Y. Yang, and P. Schaumont, “A Unifying Interface Abstraction for Accelerated Computing in Sensor Nodes,” 2011 Electronic System Level Synthesis Conference, San Diego, June 2011.
[PDF] J. Zhang, Y. Tang, S. Hirve, S. Iyer, P. Schaumont, Y. Yang, “A Software-Hardware Emulator for Sensor Networks,” 8th Annual IEEE Communications Society Conference on Sensor, Mesh and Ad Hoc Communications and Networks, June 2011. Best Paper Award
[PDF] X. Guo, P. Schaumont, “Optimizing the Control Hierarchy of an ECC Coprocessor Design on an FPGA based SoC Platform,” 5th International Workshop on Applied Reconfigurable Computing (ARC2009), LNCS5453, pp. 169-180, Springer Verlag, March 2009.
[PDF] Sakiyama, K., De Mulder, E., Preneel, B., and Verbauwhede, I. ,”Side-channel resistant system-level design flow for public-key cryptography” In Proceedings of the 17th ACM Great Lakes Symposium on VLSI (Stresa-Lago Maggiore, Italy, March 11 – 13, 2007). GLSVLSI ’07. ACM, New York, NY, 144-147.
[PDF] Tiri, K. and Schaumont, P. ,”Changing the odds against masked logic,” In Proceedings of the 13th international Conference on Selected Areas in Cryptography (Montreal, Canada, August 17 – 18, 2006). E. Biham and A. M. Youssef, Eds. Lecture Notes In Computer Science. Springer-Verlag, Berlin, Heidelberg, 134-146.
[PDF] Schaumont, P. and Verbauwhede, I. “A Component-Based Design Environment for ESL Design,” IEEE Des. Test 23, 5 (Sep. 2006), 338-347.
[PDF] P. Schaumont, D. Hwang, S. Yang, I. Verbauwhede, “Multi-level Design Validation in a Secure Embedded System,” IEEE Transactions on Computers, special issue HLDVT 2005, October 2006.
[PDF] Schaumont, P., Ching, D., and Verbauwhede, I. “An interactive codesign environment for domain-specific coprocessors,” ACM Trans. Des. Autom. Electron. Syst. 11, 1 (Jan. 2006), 70-87.
[PDF] K. Tiri, P. Schaumont, “Changing the odds against masked logic,” 13th Annual Workshop on Selected Areas in Cryptography, Montreal, Canada, 2006.
[PDF] Schaumont, P., Lai, B. C., Qin, W., and Verbauwhede, I. “Cooperative multithreading on Embedded multiprocessor architectures enables energy-scalable design,” In Proceedings of the 42nd Annual Design Automation Conference (Anaheim, California, USA, June 13 – 17, 2005). DAC ’05. ACM, New York, NY, 27-30.
[PDF] Yang, S., Schaumont, P., and Verbauwhede, I. “Microcoded coprocessor for embedded secure biometric authentication systems,” In Proceedings of the 3rd IEEE/ACM/IFIP international Conference on Hardware/Software Codesign and System Synthesis (Jersey City, NJ, USA, September 19 – 21, 2005). CODES+ISSS ’05. ACM, New York, NY, 130-135.
[PDF] Villa, O., Schaumont, P., Verbauwhede, I., Monchiero, M., and Palermo, G. “Fast Dynamic Memory Integration in Co-Simulation Frameworks for Multiprocessor System on-Chip,” In Proceedings of the Conference on Design, Automation and Test in Europe – Volume 2 (March 07 – 11, 2005). Design, Automation, and Test in Europe. IEEE Computer Society, Washington, DC, 804-805.
[PDF] Hwang, D. D., Yang, S., Verbauwhede, I., and Schaumont, P. “Multilevel design validation in a secure embedded system,” In Proceedings of the High-Level Design Validation and Test Workshop, 2005. on Tenth IEEE international (November 30 – December 02, 2005). HLDVT. IEEE Computer Society, Washington, DC, 203-210.
[PDF] B. Lai, P. Schaumont, and I. Verbauwhede, “Energy and Performance Analysis of Mapping Parallel Multi-threaded Tasks for An On-Chip Multi-Processor System,” In 23rd IEEE International Conference on Computer Design (ICCD 2005), IEEE, pp. 102-104, 2005.
[PDF] P. Schaumont, K. Sakiyama, A. Hodjat, I. Verbauwhede, “Embedded software integration for coarse-grain reconfigurable architectures,” 2004 Reconfigurable Architectures Workshop (RAW 2004), April 2004
[PDF] Schaumont, P. and Verbauwhede, I. 2004. “Interactive Cosimulation with Partial Evaluation,” In Proceedings of the Conference on Design, Automation and Test in Europe – Volume 1 (February 16 – 20, 2004). Design, Automation, and Test in Europe. IEEE Computer Society, Washington, DC, 10642.
[PDF] Verbauwhede, I. and Schaumont, P., “The happy marriage of architecture and application in next-generation reconfigurable systems,” In Proceedings of the 1st Conference on Computing Frontiers (Ischia, Italy, April 14 – 16, 2004). CF ’04. ACM, New York, NY, 363-376.
[PDF] Verbauwhede, I., Schaumont, P., Piguet, C., and Kienhuis, B. 2004, “Architectures and Design Techniques for Energy Efficient Embedded DSP and Multimedia Processing,” In Proceedings of the Conference on Design, Automation and Test in Europe – Volume 2 (February 16 – 20, 2004). Design, Automation, and Test in Europe. IEEE Computer Society, Washington, DC, 20988.
Schaumont, P. R. , “Domain-Specific Codesign for Embedded Security,” Doctoral Thesis. UMI Order Number: AAI3169146., University of California at Los Angeles.
[PDF] P. Schaumont, and I. Verbauwhede, “Domain specific tools and methods for application in security processor design,” Kluwer Journal for Design Automation of Embedded Systems 7(4), pp. 365-383, 2002.
Formal Aspects
[LNK] Backes, M. and Koepf, B. “Formally Bounding the Side-Channel Leakage in Unknown-Message Attacks,” In Proceedings of the 13th European Symposium on Research in Computer Security: Computer Security (Málaga, Spain, October 06 – 08, 2008). S. Jajodia and J. Lopez, Eds. Lecture Notes In Computer Science, vol. 5283. Springer-Verlag, Berlin, Heidelberg, 517-532.
[LNK] Hansen, M. R., Madsen, J., and Brekling, A. W ,”Semantics and verification of a language for modelling hardware architectures,” In Formal Methods and Hybrid Real-Time Systems, C. B. Jones, Z. Liu, and J. Woodcock, Eds. Lecture Notes In Computer Science, vol. 4700. Springer-Verlag, Berlin, Heidelberg, 300-319.
[LNK] Koepf, B. and Basin, D, “An information-theoretic model for adaptive side-channel attacks“, In Proceedings of the 14th ACM Conference on Computer and Communications Security (Alexandria, Virginia, USA, October 28 – 31, 2007). CCS ’07. ACM, New York, NY, 286-296.
[PDF] Schaumont, P., Shukla, S., and Verbauwhede, I. “Design with race-free hardware semantics,” In Proceedings of the Conference on Design, Automation and Test in Europe: Proceedings (Munich, Germany, March 06 – 10, 2006). Design, Automation, and Test in Europe. European Design and Automation Association, 3001 Leuven, Belgium, 571-576.
[LNK] Boris Koepf and David A. Basin, “Timing-Sensitive Information Flow Analysis for Synchronous Systems,” ESORICS 2006, 243-262.
[LNK] Schaumont, P., Sandeep Shukla, and Verbauwhede, I. “Extended abstract: a race-free hardware modeling language,” In Proceedings of the 2nd ACM/IEEE international Conference on Formal Methods and Models For Co-Design (July 11 – 14, 2005). Conference on Formal Methods and Programming Models for Codesign. IEEE Computer Society, Washington, DC, 255-256.
Crypto Hardware
[PDF] Knezevic, M. and Verbauwhede, I. ,”Hardware evaluation of the Luffa hash family,” In Proceedings of the 4th Workshop on Embedded Systems Security (Grenoble, France, October 15 – 15, 2009). WESS ’09. ACM, New York, NY, 1-6.
[PDF] M. Knezevic, K. Sakiyama, Y. K. Lee, and I. Verbauwhede, “On the High-Throughput Implementation of RIPEMD-160 Hash Algorithm,” In Proceedings of the IEEE 19th International Conference on Application-specific Systems, Architectures and Processors (ASAP 2008), 6 pages, 2008.
[PDF] X. Guo, Z. Chen, P. Schaumont, “Energy and Performance Evaluation of an FPGA-based SoC Platform with AES and PRESENT Coprocessors“, International Workshop on Systems, Architectures, Modeling, and Simulation (SAMOS 2008), July 2008.
[PDF] J. Fan, L. Batina, and I. Verbauwhede, “HECC Goes Embedded: An Area-efficient Implementation of HECC,” In Selected Areas in Cryptography, 15th Annual International Workshop, SAC 2008, Lecture Notes in Computer Science 5381, R. Avanzi, L. Keliher, and F. Sica (eds.), Springer-Verlag, 15 pages, 2008.
[PDF] Sakiyama, K., Batina, L., Preneel, B., and Verbauwhede, I. ,”HW/SW co-design for public-key cryptosystems on the 8051 micro-controller,” Comput. Electr. Eng. 33, 5-6 (Sep. 2007), 324-332.
[LNK] J. Fan, K. Sakiyama, and I. Verbauwhede, “Montgomery Modular Multiplication Algorithm on Multi-Core Systems,” In IEEE Workshop on Signal Processing Systems: Design and Implementation (SIPS 2007), IEEE, pp. 261-266, 2007.
[PDF] Simpson, E., Yu, P., Schaumont, P., Ahuja, S., and Shukla, S. ,”VT Matrix Multiply Design for MEMOCODE ’07,” In Proceedings of the 5th IEEE/ACM international Conference on Formal Methods and Models For Codesign (May 30 – June 02, 2007). Conference on Formal Methods and Programming Models for Codesign. IEEE Computer Society, Washington, DC, 95-96.
[PDF] K. Sakiyama, L. Batina, B. Preneel, and I. Verbauwhede, “Multi-core Curve-based Cryptoprocessor with Reconfigurable Modular Arithmetic Logic Units over GF(2n),” IEEE Transactions on Computers 56(9), pp. 1269-1282, 2007.
[PDF] K. Sakiyama, E. De Mulder, B. Preneel, and I. Verbauwhede, “A Parallel Processing Hardware Architecture for Elliptic Curve Cryptosystems,” In IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006), IEEE, pp. III-904-III-907, 2006.
[PDF] K. Sakiyama, B. Preneel, and I. Verbauwhede, “A Fast Dual-Field Modular Arithmetic Logic Unit and Its Hardware Implementation,” In IEEE International Symposium on Circuits and Systems (ISCAS 2006), IEEE, pp. 787-790, 2006.
[PDF] K. Sakiyama, L. Batina, B. Preneel, and I. Verbauwhede, “Superscalar Coprocessor for High-speed Curve-based Cryptography,” In Cryptographic Hardware and Embedded Systems – CHES 2006, Lecture Notes in Computer Science 4249, L. Goubin, and M. Matsui (eds.), Springer-Verlag, pp. 415-429, 2006.
[PDF] Y. K. Lee, H. Chan, and I. Verbauwhede, “Throughput Optimized SHA-1 Architecture Using Unfolding Transformation,” In 17th IEEE International Conference on Application-specific Systems, Architectures and Processors (ASAP 2006), IEEE, pp. 354-359, 2006.
[PDF] P. Yu, P. Schaumont, “Executing Hardware as Parallel Software for Picoblaze Networks,” 16th International Conference on Field Programmable Logic and Applications (FPL 2006), Madrid, Spain, August 2006.
[PDF] L. Batina, A. Hodjat, D. Hwang, K. Sakiyama, and I. Verbauwhede, “Reconfigurable architectures for curve-based cryptography on embedded micro-controllers,” In 16th International Conference on Field Programmable Logic and Applications (FPL 2006), IEEE, pp. 1-4, 2006.
[LNK] K. Sakiyama, L. Batina, B. Preneel, and I. Verbauwhede, “HW/SW Co-design for Accelerating Public-Key Cryptosystems over GF(p) on the 8051 µ-controller,” In World Automation Congress (WAC 2006), IEEE, pp. 1-6, 2006.
[LNK] Yang, S., Sakiyama, K., and Verbauwhede, I. ,”Efficient and secure fingerprint verification for embedded devices,” EURASIP J. Appl. Signal Process. 2006 (Jan. 2006), 24-24.
[LNK] A. Hodjat, L. Batina, D. Hwang, and I. Verbauwhede, “A Hyperelliptic Curve Crypto Coprocessor for an 8051 Microcontroller,” In IEEE Workshop on Signal Processing Systems: Design and Implementation (SIPS 2005), IEEE, pp. 93-98, 2005.
[LNK] L. Batina, D. Hwang, A. Hodjat, B. Preneel, and I. Verbauwhede, “Hardware/Software Co-design for Hyperelliptic Curve Cryptography (HECC) on the 8051 microprocessor,” In Cryptographic Hardware and Embedded Systems – CHES 2005, Lecture Notes in Computer Science 3659, J. R. Rao, and B. Sunar (eds.), Springer-Verlag, pp. 106-118, 2005.
[LNK] Thomas Hjorth, “Supporting Privacy in RFID Systems,” Master’s thesis, Informatics and Mathematical Modelling, Technical University of Denmark, 2004.
[PDF] D. Ching, P. Schaumont, and I. Verbauwhede, “Integrated modeling and generation of a reconfigurable network-on-chip,” In 18th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2004), IEEE, pp. 139-145, 2004.
[PDF] Matsuoka, Y., Schaumont, P., Tiri, K., and Verbauwhede, I. “Java cryptography on KVM and its performance and security optimization using HW/SW co-design techniques,”. In Proceedings of the 2004 international Conference on Compilers, Architecture, and Synthesis For Embedded Systems (Washington DC, USA, September 22 – 25, 2004). CASES ’04. ACM, New York, NY, 303-311.

Modeling in the GEZEL Language

Datapaths are the basic building blocks in GEZEL, similar to a module in Verilog or an entity in VHDL. Registers, signals and expressions are essential modeling elements. Expressions can be assembled in datapath definitions. Datapaths can finally be composed to larger architectures.

Register and Signals GEZEL models synchronous, single-clock designs. Yet, a clock signal is not present in GEZEL language, it is implicit in the design description. By looking at a GEZEL program, you can say precisely how it will behave as a clock-cycle true description. You can do this by looking at the kind of variables it uses in calculations. GEZEL has two kinds of variables: signals and registers.

A signal can hold a value within a single clock cycle. It has the same meaning as a wire in an actual implementation. A signal also has a name and a type and is created with the sig keyword. For example, a signal with name v12 and type ns(12) is created as follows.

sig v12 : ns(12);

This type ns(12) stands for a 12-bit unsigned number. Signal v12 can hold values from 0 to 4095. When you force this signal to hold values outside of this range, precision loss will occur. There is one other type available for values, called tc(n). This type represents arbitrary-length signed numbers with two’s complement representation. For example, to create the equivalent of a C integer on a 32-bit machine, use the following definition.

sig aCinteger : tc(32);

Registers are used to store values over multiple clock cycles. In contrast to signals, register variables have two values: a current-value and a next-value. The current-value is the value available at the output of a register, so it is the value obtained when reading from the register. The next-value is the value at the input of the register, so it is the value that is being written into the register. A register is created in the same way as a signal but uses the reg keyword. A 16-bit unsigned register for example is created as

reg r : ns(16);

The register lies at the basis of clock-cycle-true behavior. There are implicit simulation semantics tied to the register. At the start of each clock cycle, the next-value (of the previous clock cycle) is copied into the current-value (of the current clock cycle). In between clock edges, the next-value is updated based on the current-value, constants and inputs. This way, it is possible to create clock-cycle true descriptions without mentioning the clock explicitly. The initial value of a register is zero (0), while the initial value of a signal is undefined.

Expressions Expressions enable calculations with signals and registers. Expressions are formed using operators that reference the names of signals and registers. For example, an addition of two signals b and c into signal a looks like

a = b + c;

When a has insufficient precision to hold all possible combinations of the sum b + c, precision loss can occur. For example, assume the following types for a, b and c:

sig a, b, c : ns (8);

Clearly, when b + c is bigger than 256, the result cannot be stored in a. GEZEL will throw out bits at the most-significant side of the result (overflow). If b + c is 260, then the resulting value in a will be 4 (260 = 256 + 4). In some expressions, intermediate values will occur. In the above expression, b + c is such an intermediate value. A more obvious example is

a = ((b+b) + (c+c));

Brackets are used to indicate the order in which this expression is to be evaluated. First, the sums b+b and c+c are obtained. These two intermediate values are combined and assigned to a. Intermediate values need a type, too. GEZEL uses a default type rule to choose the type of intermediate results, such that the intermediate result does not loose precision.

Expressions combine signals and registers with operators. Operators have a precedence, a preferred order of evaluation. For example, in an expression such as

a = b * b + c * c;

the multiplications (*) will be performed before the additions (+), because multiplication has a higher precedence than addition. Precedence rules can be modified by using round brackets. The following bullets introduce the different operators that can be used in expressions, starting at the ones with low precedence and going up to high-precedence operations.

Assignment and Selection

The assignment operation updates the value of a signal or register. The selection operation conditionally extracts the value of a signal or register.

a = expression
The assignment operation assigns the value of epxression into a. At the moment of assignment, the value of expression is casted in a (cfr the casting operation).
b ? c : d;
The selection operation implements choice. The value of b is evaluated. When it is nonzero, the expression evaluates to c. When it is zero, the result is d.

Bitwise Logical Operations

Bitwise logical operations combine two bitpatterns into a new bitpattern. The bits at corresponding indices are combined using a single-bit logical operations. The logical operations are Inclusive Or, Exclusive Or, and And.

b | c
The bit pattern in b is OR-ed with the bit pattern in c
b ^ c
The bit pattern in b is XOR-ed with the bit pattern in c
b & c
The bit pattern in b is AND-ed with the bit pattern in c
The bit pattern in b is inverted (This operation has higher precedence than all two-operand operations).

Comparison Operations

Comparison operations compare the value of two expressions and yield a true-or-false result. The value true or false is represented as a 1-bit unsigned number (ns(1)), with 1 indicating true, and 0 indicating false.

a == b
True if the value of a is equal to the value of b
a != b
True if the value of a is different from the value of b.
a < b
True if the value of a is smaller than the value of b.
a > b
True if the value of a is bigger than the value of b.
a <= b
True if the value of a is smaller than or equal to the value of b.
a >= b
True if the value of a is bigger than or equal to the value of b.

Arithmetic Operations

Arithmetic Operations do calculations on all of the bits of a signal or register, treated as an unsigned number or else a two’s complement signed number.

a << b
a is shifted left over b positions. The wordlength of the result is equal to the wordlength of a plus 2-to-the-power (wordlength of b).
a >> b
a is shifted right over b positions. The wordlength and the sign of the result are equal to that of a (arithmetic shift).
a + b
a is added to b.
a - b
b is subtracted from a.
a * b
a and b are multiplied.
a # b
The bitpatterns of a and b are concatenated.
Negate the value The bitpatterns of a and b are concatenated.

Cast Operation

A cast operation converts the value of a signal into one with another type. This way, it is possible to convert for example a 5-bit unsigned number into a 6-bit signed number. When the target type has enough bits, no precision will be lost. For two’s complement signed numbers, a concept called sign extension is applicable. Sign extension preserves the sign of a two’s complement number when the wordlength increases. When the target type has insufficient bits, some precision can be lost. Bits are chopped off at the most-significant side. The resulting bitpattern is interpreted as a signed/unsigned number of the targeted wordlength.

For example, if a is ns(8) and holds the value 7, and b is tc(4), then the following operation will leave the bitpattern 0b1111 in b, which is interpreted as -1.


b = (tc(3)) a;

Bit Selection Operation

A bit selection operation extracts part of a bitpattern in a word. There is a single-bit format as well as a bitvector format.

Returns bit n from a as a ns(1) number. n has to be a positive constant. If n is bigger than the wordlength of a, 0 is returned.
Return bitvector from bit m to bit n (n >= m) from a as a ns(n-m+1) number. n and m have to be positive constants. If a bit index goes out of the wordlength range of a, 0 is returned for that bit.

Lookup Table Operation

A Lookup Table Operation offers access to a constant array, which is defined earlier in the code. The lookup table content needs to be defined first, after which it can be accessed using a lookup table operation. A Lookup Table definition is done by enumerating all the elements in the lookup table in a comma separated list as follows:


lookup a : ns(8) = {15, 22, 36, 0x4f};

This defines a lookup table a which holds elements of type ns(8). The table holds 4 elements. The element at index position 0 is 15 and the element at index position 3 is 0x4f (79).

The lookup table access operation simply access the array using the index in between round brackets. For example, to access the third element of a, one would use


Signal Flow Graphs The cycle-true execution model of GEZEL expresses concurrency by allowing multiple expressions to be evaluated in the same clock cycle. A set of expressions that execute together in the same clock cycle are grouped together in a signal flowgraph. Consider the design of a Viterbi Butterfly operation (a well-known operation in convolutional decoding). This operation processes tuples of data according to an operation called add-compare-select. 

y1 = min( d1 + a, d2 - a )
y2 = min( d1 - a, d2 + a )

Assume the following set of signals and registers.


sig a1, s1, a2, s2 : ns(8);  // intermediate signals
reg d1, d2, y1, y2 : ns(8);  // input and output tuple
reg a : ns(8);

The signals flowgraph of expressions that implements this equation can be as follows.


always {
   a1 = d1 + a;
   s1 = d1 - a;
   a2 = d2 + a;
   s2 = d2 + a;
   y1 = (a1 > s2) ? s2 : a1;
   y2 = (s1 > a2) ? a2 : s1;

The keyword always indicates that the group of expressions following it will execute each clock cycle. A signal flow graph can hold an arbitrary number of expressions. All expressions within a single signal flow graph are concurrent within one clock cycle. The order in which expressions are evaluated is independent of the order in which they appear in the GEZEL program (i.e., it is independent of their lexical order). Rather, the order is determined by the data precedences of signals and registers. A register can always be read, at any moment during a clock cycle. As discussed in Section 2.1 on page 9, a register has both a current value and a next value. For a signal, this is not the case. A signal has only an immediate value, valid within a single clock cycle. Thus, a signal has to be written first before it can be read. It has to be written the first time within a clock cycle based on values in registers and constants. As a consequence of this property of signals and registers, the order of expressions within a signal flow graph becomes irrelevant. For example, if you would write:


always {
   y1 = (a1 > s2) ? s2 : a1;
   y2 = (s1 > a2) ? a2 : s1;
   a1 = d1 + a;
   s1 = d1 - a;
   a2 = d2 + a;
   s2 = d2 + a;
Named Signal Flow Graphs Besides the unnamed always signal flow graph, you can create signal flow graphs with a name using the sfg keyword. For example, the previous signal flow graph could be written as: 

sfg mysfg {
   y1 = (a1 > s2) ? s2 : a1;
   y2 = (s1 > a2) ? a2 : s1;
   a1 = d1 + a;
   s1 = d1 - a;
   a2 = d2 + a;
   s2 = d2 + a;

The difference between a named signal flowgraph (sfg) and the unnamed always is that the former does not automatically execute each clock cycle. GEZEL will allow you to create a controller that schedules the named signal flowgraph.

Datapath Modules A datapath corresponds to a module in Verilog or an entity in VHDL. It is a piece of hardware logic that is treated as a single entity by subsequent RT- and logic synthesis tools. A datapath combines a number of named signal flow graphs with a list of input and output signals. A signal flow graph can be thought of as an instruction for that datapath. A datapath can have only a single always signal flow graph, but it can have multiple named signal flow graphs.A datapath is the smallest GEZEL unit that can be simulated. So, subsequent examples will be fully self-contained programs rather than snippets. Here is an example of a 2-bit counter as a hardwired datapath


dp counter(out value : ns(2)) {
   reg c : ns(2);
   always {
     value = c;
     c = c + 1;
     $display("Cycle ", $cycle, ": counter = ", value);

system S {

This datapath has a single output port called value. An output port also has a type, indicated after the colon following the port name. The ports define the outline of the datapath. The only way an ‘outsider’ can access the datapath is by reading/writing values on the datapath ports.

On line 2, we create a 2-bit register. This register is local to the datapath counter. It can be accessed only from within the datapath.

On line 3-7, we define unnamed always signal flowgraph. It contains, besides expressions, also a directive on line 6. A GEZEL directive does affect how the simulator behaves, but it does not affect the simulation outcome. In this case we are using the display directive, which is used to print out values on the datapath. One special variable that is accessed is called $cycle. This variable returns the current simulation cycle. Thus, the effect of the display directive will be to print out the current simulation cycle as well as the output value of the counter.

Finally, on lines 10-11, the toplevel of the system is expressed. A GEZEL file must always have a system statement.

A datapath definition thus consists of three elements: An IO definition, a definition of local signals and registers, and a set of signal flowgraphs. The IO-definition can create input – as well as output ports. For example, a simple ALU that can add, subtract and accumulate would look as follows.


dp alu(in x, y : ns(8); out z : ns(8)) {
   reg acc : ns(8);
   sfg add {
     z = x + y;
   sfg sub {
     z = x - y;
   sfg accumulate {
     acc = acc + x;
     z   = acc + x;
   sfg rst {
     acc = 0;
     z   = 0;

There are four named signal flowgraphs in this example. The datapath has two inputs, x and y, and one output, z. There is an internal accumulator register, acc. There is one signal flowgraph call rst. This will be used to reset the accumulator register. During this reset operation, we will also drive the output of the datapath to zero.

Not all datapath definitions that one can write down in GEZEL are valid. There are four rules to which a datapath definition must conform. When any of those rules are violated, then either the GEZEL parser will reject your code, or else a runtime error message will be triggered. The four rules are enumerated below.

  • During any clock cycle, all datapath outputs are defined.
  • During any clock cycle, no combinatorial loop between signals can exist. This happens when there is a circular dependence on signal values, i.e. signal a is used to define signal b, and signal b is used to define signal a. This implies that all signal values will eventually only be dependent, during any clock cycle, on datapath inputs, datapath registers and constant values.
  • If an expression uses the value of a signal during a particular clock cycle, then that signal must also appear at the left-hand side of an assignment expression in the same clock cycle.
  • Neither registers, nor signals or datapath outputs can be assigned more than once during a clock cycle. A special case of this is that a datapath input cannot be assigned inside of a datapath, because a datapath input must be driven by the output of another datapath.

Here are a few examples of erroneous signal flowgraphs.


// WRONG: output v is not always defined
dp bad1(out v : ns(1)) {
  always {}

// WRONG: a combinatorial loop between signals
dp bad2 {
  sig a, b : ns(1);
  always {
  a = b + 1; // a defines b, b defines a
  b = a + 1; // and both are signals (not registers)

// WRONG: dangling signal
dp bad3 {
  sig a, b : ns(1);
  always {
   a = b + 1;  // b is unknown

// WRONG: a signal is assigned more than once
dp bad4 {
  sig a : ns(1);
  always {
    a = 1;
    a = 5;
Structural Hierarchy Datapaths can be included inside of other datapaths, thus implementing structural hierarchy. For this purpose, GEZEL provides the keyword use. Consider the example of a 4-input AND gate.


 1 dp andgate(in a, b : ns(1); out q : ns(1)) {
 2   always {
 3     q = a & b;
 4   }
 5 }
 7 dp andgate2 : andgate
 8 dp andgate3 : andgate
10 dp fourinputand(in a, b, c, d : ns(1); out q : ns(1)) {
11   sig s1, s2 : ns(1);
12   use andgate ( a,  b, s1);
13   use andgate2( c,  d, s2);
14   use andgate3(s1, s2,  q);
15   always {
16      $display(a," ", b, " ", c, " ", d, " -> ", q); 
17   }
18 }
20 dp tst(out a, b, c, d : ns(1)) {
21   reg n : ns(4);
22   always {
23     n = n + 1;
24     a = n[0]; b = n[1]; c = n[2]; d = n[3];
25   } 
26 }
28 dp sysandgate {
29   sig a, b, c, d, q : ns(1);
30   use tst(a, b, c, d);
31   use fourinputand(a, b, c, d, q);
32 }
34 system S {
35   sysandgate;
36 }

Lines 10-18 define a four-input AND gate using three two-input AND gates. A use statement allows to include a two-input AND gate inside of the four-input AND gate. Connections can be made to datapath inputs, outputs or local signals. Of course, the semantic requirements enumerated earlier must be obeyed. Lines 20-26 define a testbench that enumerates all 4-bit input patterns by decomposing the bits of a counter. Finally, lines 28-32 interconnect the testbench to the four-input AND gate in a system block.

Datapath Cloning Sometimes, multiple copies of one and the same datapath are needed. GEZEL provides a cloning operation to create such an identical copy of a single datapath. The next example shows how three identical AND gates can be created by defining one and then cloning the first AND gate two times.


 dp andgate(in a, b : ns(1); out q : ns(1)) {
   always {
     q = a & b;

 dp andgate2 : andgate
 dp andgate3 : andgate

Cloning creates an identical but independent copy. If the parent datapath includes a register, then the cloned datapath will contain its’ own register. This completes basic modeling techniques for datapaths. The next section covers the modeling of controllers, that enable the use of datapath with multiple signal flowgraphs.

Controller Design The control/datapath model of GEZEL is based on a more generic form of register-transfer level modeling called Finite State Machine and Datapath, or FSMD for short. An FSMD model expresses both datapath operations as well as control operations. It makes a clear distinction however between what is control and what is data processing. An FSMD consists of two cross-coupled state machines. One plays the role of the controller, the other plays the role of the datapath. Information exchange between the two includes conditions (going from the datapath to the controller) and instructions (going from the controller to the datapath).An FSMD provides separate modeling for data processing and for control processing. That is for a good reason, in practice there are many differences between the controller and the datapath. First, the modeling style for the two is different. Datapaths are modeled with expressions on signals and registers. Controllers are modeled with state transition graphs. Second, the logic implementation style of the two also shows differences. A datapath with operators typically exhibits a regular logic style. A controller on the other hand exhibits an irregular logic style.

The FSMD concepts map as follows to the GEZEL model.

  • Instructions are created by selecting one or more sfg out of a datapath. A single sfg can be directly referred to by its name. A set of sfg is enumerated as a comma-separated list in between brackets. For example, assume a datapath is defined as follows. 
    dp adp(out a : ns(3)) {
       sig k : ns(2);
       sfg f1 { a = 3; }
       sfg f2 { k = 2;
                a = 2;}
       sfg f3 { k = 1; }

    Then, the following are valid instructions: 

     (f1, f3)

    Examples of invalid instruction are: 

     (f1, f2)

    These are invalid because the violate the semantic requirements for datapath models.When an instruction is executed during a particular control step of a controller, then that will imply execution of the sfg included in the instruction as well.

  • Conditions are created out of logical expressions on registers in the datapath. When conditions are extracted out of datapath inputs or signals, the GEZEL parser will issue a warning. The reason is that GEZEL selects the instruction to execute right at the start of a clock cycle. Before this can be done, any required conditions need to be defined. At the start of a clock cycle however, the only stable values are constants and register outputs. A user can still continue the simulation despite the presence of this warning. However, one must realize at that moment there is a potential risk for anticausal simulation effects (e.g. using the value of a signal before it is available). Therefore, when this warning occurs one must consider if the code can be written such that no warnings appear.
  • The connection between a datapath and a controller is established by refering the name of the datapath while creating the controller. Some earlier examples of this could be seen with the hardwired controller: 
    dp adp(out a : ns(3)) {
     hardwired h_adp(adp) { f1; }

    In this example, a controller called h_adp is created and attached to datapath adp.

Sequencer Controllers Besides the trivial hardwired controller discussed above, the simplest controller is the sequencer. As the name indicates, a sequencer will execute a set of instructions sequentially, without taking any conditions into account.A typical case where sequencers are useful is for static, fixed schedules. Consider for example a 4-tap decimating averaging filter. Such a filter reads four subsequent samples, integrates and dumps the sum of the samples at every fourth sample.


 1 dp avg(in i : ns(8); out o : ns(8)) {
 2   reg acc : ns(9);
 3   sfg phase0  { acc = i; o = 0; }
 4   sfg phase12 { acc = acc + i; o = 0;}
 5   sfg phase3  { o   = (acc + i) >> 2;}
 6 }
 7 sequencer h_avg(avg) { phase0;
 8                        phase12;
 9                        phase12;
10                        phase3;}
12 dp tst(in i : ns(8); out o : ns(8)) {
13    reg a : ns(8);
14    always {
15      o = a;
16      a = a + 2;
17      $display("C", $cycle, ": i=", o, " o=", i);
18    }
19 }
21 dp sysavg {
22   sig i, o : ns(8);
23   use avg(i, o);
24   use tst(o, i);
25 }
27 system S {
28   sysavg;
29 }

An averaging filter has four phases. As the datapath in lines 1-6 illustrates, there is an initialization instruction (phase0), an accumulation instruction (phase12) and a termination instruction (phase3). The controller for this datapath is a sequencer with four steps, as shown in lines 7-10. Lines 12-19 show a simple testbench that will feed a string of even numbers to this four-phase averager. Finally, lines 21-29 show the system interconnect function.

An important motivation for developing FSMD models, instead of plain hardwired datapaths, is that an FSMD allows to express operation sharing in an elegant way. Consider the descriptions in phase0, phase12 and phase3. They specify two assignments on an accumulator register and three assignments to an output port without the use of a multiplexer. When the same behavior would be represented in a single sfg, it would look like this:


reg phase : ns(2);
 sfg singlephase {
   acc = (phase == 0) ? i : acc + i;
   o   = (phase == 3) ? (acc + i) >> 2 : 0;
   phase = phase + 1;

This description style gives you precise control over how the implementation will look like, but requires more modeling as the control operations have to be written down explicitly as expression.

FSM Controllers A Finite State Machine implements conditional control sequencing on a datapath. The control model is captured by a state transition graph. A Finite State Machine can be in a well-defined number of states. One of these states is the initial state, it is the state the FSM is in when it first initializes.

A Finite State Machine will take one state transition per clock cycle. During this state transition, a datapath instruction (one or more sfg) can be executed. A state transition can be conditional. In that case, the condition is based on the values of registers in the datapath (or on logical expressions directly derived from it). When state transitions are conditional, then the set of conditions must be complete. This means that, for every if (true-branch), there must be a complimentary else (false-branch).

Consider the following simple example of FSM modeling.


fsm h_avg(avg) {
  initial s0;
  state s1, s2, s3;
  @s0 phase0  -> s1;
  @s1 phase12 -> s2;
  @s2 phase12 -> s3;
  @s3 phase3  -> s0;

This description creates four states, called s0, s1, s2 and s3. s0 is the initial state, the others are normal states. A state transition indicates the start state with the @ symbol, and the target state with an arrow (->). In between, a datapath instruction is indicated. A single sfg can be written as such, a group of sfg is specified as a comma-separated list in between round brackets.

State transitions can also be made conditional. The state transition condition needs to be a value directly calculated from a register. An example of conditional state transitions is as follows.


@s0 if      ( c1 &  c2) then (sfg1) -> s0;
     else if ( c1 & ~c2) then (sfg2)  -> s0;
     else if (~c1 &  c3) then (sfg3) -> s0;
     else if (~c1 & ~c3) then (sfg4) -> s0;

This can also be written hierarchically, as follows.


@s0 if (c1) then
       if (c2) then (sfg1) -> s0;
               else (sfg2) -> s0;
       if (c3) then (sfg3) -> s0;
               else (sfg4) -> s0;

Next is an example with slightly more complicated FSM control. The example is a raster line drawing routine, known as the Bresenham Algorithm. The strong point of this algorithm is that it can draw lines of arbitrary slope on a discrete (X,Y) grid, and without the use of floating point arithmetic. The complete GEZEL listing illustrates how a slightly more complicated design looks like.


 1 // Bresenham line plotter for points in an arbitrary octant
 2 dp bresen(in x1_in, y1_in, x2_in, y2_in : tc(12)) {
 3    reg x, y             : tc(12);   // current plot position
 4    reg e                : tc(12);   // accumulated error
 5    reg eol              : tc(1);    // end-of-loop flag
 6    reg einc1, einc2     : tc(12);   // increments
 7    reg xinc1, xinc2     : tc(12);
 8    reg yinc1, yinc2     : tc(12);
 9    sig se, sdx, sdy     : tc(12);
10    sig asdx, asdy       : tc(12);
11    sig stepx, stepy     : tc(12);
13   sfg init {
14     // evaluate range of pixels and their absolute value
15     sdx   = x2_in - x1_in;  asdx = (sdx < 0) ? -sdx : sdx;
16     sdy   = y2_in - y1_in;  asdy = (sdy < 0) ? -sdy : sdy;
17     // determine direction of x and y increments
18     stepx = (sdx < 0) ? -1 : 1;
19     stepy = (sdy < 0) ? -1 : 1;
20     // initial error
21     se    = (asdx > asdy) ? 2 * asdy - asdx : 2 * asdx - asdy;
22     // error increment for straight (einc1) and diagonal (einc2) step
23     einc1 = (asdx > asdy) ? (asdy - asdx) : (asdx - asdy);
24     einc2 = (asdx > asdy) ?  asdy         :  asdx;
25     // increment in x direction for straight and diagonal steps
26     xinc1 = (asdx > asdy) ? stepx : stepx;
27     xinc2 = (asdx > asdy) ? stepx : 0;
28     // increment in y direction for straight and diagonal step
29     yinc1 = (asdx > asdy) ? stepy : stepy;
30     yinc2 = (asdx > asdy) ? 0     : stepy;
31     // initialize registers
32     x    = x1_in;  y    = y1_in; 
33     e    = se;
34   }
36   // end-of-loop test - check if we reach target
37   sfg looptest {
38     eol   = ((x == x2_in) & (y == y2_in));
39   }
41   // loop body: adjust x, y and error accumulator
42   // use error value to decide straight or diagonal step
43   sfg loop {
44     x     = (e >= 0) ? x + xinc1 : x + xinc2;
45     y     = (e >= 0) ? y + yinc1 : y + yinc2;
46     e     = (e >= 0) ? e + einc1 : e + einc2;
47     $display($hex,"Cycle: ",$cycle," Plot point (", x, ",", y, ") ");
48   }
49 }
50 // controller for bresenham algorithm
51 // initializes, draws one line and then waits in state s3
52 fsm f_bresen(bresen) {
53   initial s0;
54   state s1, s2, s3;
55   @s0 (init)                -> s1;
56   @s1 (loop, looptest)      -> s2;
57   @s2 if (eol) then (init)  -> s3;
58       else (loop, looptest) -> s2;
59   @s3 (init)                -> s3;
60 }
62 // testbench
63 dp test_bresen(out x1, y1, x2, y2 : tc(12)) {
64   sig sx : tc(12);
65   sfg run {
66     x1 = 5; x2 = 18; y1 = 2; y2 = 8;
67   }    
68 }
69 hardwired h_test_bresen(test_bresen) {run;}
71 dp sysbresen {
72   sig x1, y1, x2, y2 : tc(12);
73   use bresen(x1, y1, x2, y2);
74   use test_bresen(x1, y1, x2, y2);
75 }
77 system S {
78   sysbresen;
79 }

The Bresenham datapath accepts two coordinate tuples, indicating the starting resp. ending points of the vector to be drawn. The bulk of the calculation of the algorithm takes place in an initialization phase, for which a single sfg is created (lines 13-34). Basically, the Bresenham algorithm works with three accumulators: one for the x coordinate (register x), one for the y coordinate (register y), and one error accumulator (register e). At runtime, the error accumulator is evaluated to decide on the required increments in the x and y accumulators.

Not all vectors have the same length, and the Bresenham algorithm only takes a single step (horizontal, vertical or diagonal) per iteration. Because each clock only a single iteration of the Bresenham algortihm is executed, a complete line takes a variable number of clock cycles to generate a vector. Lines 37-39 contain a loop test that decide when to terminate a loop. The actual loop body, which contains the error accumulations, is shown in lines 43-48.

The FSM controller of the Bresenham algorithm is shown in lines 52-60. After initialization, the algorithm takes a first iteration of the loop and evaluates the end-of-loop flag on line 56. From then on, the FSM takes conditional state transitions, which will take it back each time from state s2 to state s2 (line 58), or else terminate the loop into state s3 (line 57). The test (eol) checks when the end-of-loop flag becomes true. This test is taken on the value in a register, so it actually checks the end-of-loop condition of the previous iteration. For this reason, the instruction of the transition into s3 is an initialization instruction (line 57). When the output of eol is high, the x and y accumulators are already at there target position, and no more increments should be done.

Finally, lines 63-79 show a simple testbench for the vector generator. The test will evaluate pixels from the vector running from (5,2) to (18,8) (line 66).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Virtual Fashion Technology

Virtual Fashion Education


"chúng tôi chỉ là tôi tớ của anh em, vì Đức Kitô" (2Cr 4,5b)


News About Tech, Money and Innovation


Modern art using the GPU

Theme Showcase

Find the perfect theme for your blog.


Learn to Learn

Gocomay's Blog

Con tằm đến thác vẫn còn vương tơ

Toán cho Vật lý

Khoa Vật lý, Đại học Sư phạm Tp.HCM - ĐT :(08)-38352020 - 109

Maths 4 Physics & more...

Blog Toán Cao Cấp (M4Ps)

Bucket List Publications

Indulge- Travel, Adventure, & New Experiences


‎"Behind every stack of books there is a flood of knowledge."

The Blog

The latest news on and the WordPress community.

%d bloggers like this: