RTL Design: Writing Synthesisable Verilog and VHDL (Complete Guide)
What this article covers: This is the most complete guide to RTL design available for VLSI engineers. You will learn what Register Transfer Level (RTL) means, how to write synthesisable Verilog and VHDL, which constructs are synthesisable and which are not, RTL coding best practices, FSM coding styles, reset strategies, clock domain crossing, MISRA and lint rules, and how RTL quality directly affects synthesis, timing closure and silicon success.

RTL design is the foundation of every digital chip ever manufactured. From the simplest IoT microcontroller to the most powerful AI accelerator, every piece of logic in every chip was described first as RTL code – written in Verilog, SystemVerilog, or VHDL – before any synthesis tool, physical design tool, or foundry process ever touched it. Getting RTL right is the single most important factor in a successful chip design. RTL bugs found after tapeout can cost millions of dollars and months of schedule. RTL bugs found in simulation cost hours.
In the VLSI design flow, the RTL design stage sits right after system specification and architecture definition. It is the stage where the hardware description language (HDL) code is written that defines exactly what the chip will do. This code must be synthesisable – meaning it must be written in a way that logic synthesis tools (Synopsys Design Compiler, Cadence Genus) can translate it correctly into a gate-level netlist. Non-synthesisable code may simulate correctly but will produce wrong or no hardware at all.
This complete guide teaches you how to write professional, synthesisable RTL design in both Verilog and VHDL, from first principles to advanced industry techniques.
Table of Contents
1. What is RTL Design?
RTL design stands for Register Transfer Level design. It is a method of describing digital hardware using a Hardware Description Language (HDL), where the description focuses on how data moves between registers (flip-flops) on each clock cycle, and what combinational logic transforms that data between those transfers. The term “register transfer level” captures both concepts: registers (the flip-flops that store state) and transfers (the movement and transformation of data between them, mediated by combinational logic).
The RTL abstraction level sits between two other abstraction levels in digital design:
| Abstraction Level | Description | Example | Used For |
|---|---|---|---|
| Behavioural Level | Describes what the system does algorithmically – no timing, no registers implied | if (a > b) max = a; else max = b; | High-level synthesis input, early validation |
| RTL Level ⬅ This | Describes data transfers between registers per clock cycle, with combinational logic between | always @(posedge clk) q <= d; | Logic synthesis input — industry standard |
| Gate Level | Describes the circuit as actual logic gates (AND, OR, NAND, flip-flops) from a standard cell library | Netlist of cell instances and connections | Physical design input, gate-level simulation |
| Switch Level | Describes transistor-level connectivity (NMOS/PMOS switches) | SPICE netlist | Analog/custom IC design |
RTL design is the dominant design methodology in the semiconductor industry because it provides the right level of abstraction: detailed enough to be synthesised into hardware accurately, but abstract enough to allow designers to think in terms of functionality and architecture rather than transistor-level details. When you write RTL, you are describing hardware – not writing software. Every line of RTL code implies actual physical logic gates that will be manufactured on silicon.
💡 RTL Design Is Hardware Description, Not ProgrammingThe most common mistake beginners make is treating RTL like software code. Unlike software, in RTL design: (1) All always blocks execute concurrently, not sequentially. (2) Every signal assignment implies physical wires and logic gates. (3) Timing is real – setup and hold time violations cause silicon failures. (4) You cannot “allocate memory” – storage is always explicit flip-flops or SRAM.
2. RTL vs Behavioural vs Gate-Level Abstraction
To understand RTL design deeply, you must understand how it differs from behavioural coding and gate-level coding. The same hardware can be described at all three levels, but only RTL is the standard input for logic synthesis tools in the industrial VLSI design flow.
Consider a simple 2:1 multiplexer (MUX). Here is how it looks at all three levels:
Behavioural Level (Not Directly Synthesisable for Timing-Critical Paths)
Verilog — Behavioural
// Behavioural - uses delay, not synthesisable in this form
module mux_behav (
input a, b, sel,
output reg y
);
always @(a or b or sel) begin
#5 y = sel ? a : b; // #5 delay - NOT synthesisable
end
endmodule
RTL Level (Synthesisable – Industry Standard)
Verilog — RTL (Synthesisable)
// RTL — fully synthesisable, no delays, proper sensitivity list
module mux_rtl (
input wire a, b, sel,
output reg y
);
always @(*) begin // @(*) = complete sensitivity list
if (sel)
y = a;
else
y = b;
end
endmodule
Gate Level (Post-Synthesis Netlist)
Verilog – Gate Level (Generated by Synthesis Tool)
// Gate-level - generated by synthesis tool, not written by hand
module mux_gate (a, b, sel, y);
input a, b, sel;
output y;
MX2_HVT U1 (.A(a), .B(b), .S(sel), .Y(y)); // Standard cell instance
endmodule
The RTL version is what you write as an engineer. The gate-level version is what the synthesis tool produces. Gate-level is never hand-coded for complex designs – that would take years. This illustrates exactly why RTL design is so powerful: one line of RTL can produce hundreds of gates automatically.
3. Verilog vs VHDL vs SystemVerilog: Which to Use for RTL Design?
There are three main HDLs used for RTL design in the industry today. Choosing the right one depends on your design domain, company preference, and the geographic region you work in.
| Language | Year | Primary Use | Region / Domain | Strengths | Weaknesses |
|---|---|---|---|---|---|
| Verilog (IEEE 1364) | 1984 | RTL design, synthesis | USA, Asia (dominant) | Concise syntax, C-like, easy to learn, very widely supported | Weaker type system, easier to write bugs |
| VHDL (IEEE 1076) | 1987 | RTL design, simulation | Europe, defence/aerospace | Strong typing, verbose but explicit, DoD/aerospace standard | Verbose, steeper learning curve, slower to write |
| SystemVerilog (IEEE 1800) | 2002 | RTL design + Verification | Global (increasingly dominant) | Superset of Verilog + OOP verification features, UVM support | Large language, complex features if misused |
💡 Industry Recommendation (2025)For RTL design:SystemVerilogis the modern standard – it gives you all of Verilog’s strengths with improved syntax, explicit data types, interfaces, and better synthesis features. Many leading companies (Qualcomm, Apple, NVIDIA, Google) use SystemVerilog for RTL. VHDL remains dominant in European aerospace and defence (DO-254). Pure Verilog is still widely used in legacy designs and academia.
4. Verilog Basics for RTL Design
Verilog is a Hardware Description Language that forms the backbone of RTL design in most of the semiconductor industry. Understanding its core constructs is essential before writing any synthesisable RTL code.
4.1 Module and Port Declaration
In Verilog, every design unit is a module. A module has ports (inputs and outputs) and internal logic. This is the basic building block of any RTL design.
Verilog – Module Structure
// Module declaration — best practice for RTL design
module adder_8bit (
input wire [7:0] a, // 8-bit input A
input wire [7:0] b, // 8-bit input B
input wire cin, // Carry input
output wire [7:0] sum, // Sum output
output wire cout // Carry out
);
// Continuous assignment — combinational logic
assign {cout, sum} = a + b + cin;
endmodule
4.2 Verilog Data Types for RTL Design
Understanding data types is critical for writing correct, synthesisable RTL design in Verilog:
| Data Type | Use in RTL | Synthesisable? | Example |
|---|---|---|---|
wire | Connects module ports and continuous assignments – represents physical wires | ✅ Yes | wire [7:0] data_bus; |
reg | Holds a value in procedural (always) blocks – does NOT always imply a flip-flop! Can be combinational or sequential depending on usage | ✅ Yes | reg [3:0] count; |
integer | 32-bit signed – useful in for loops (testbench or generate) | ⚠️ Avoid in synthesisable RTL | integer i; |
parameter | Compile-time constant – essential for parameterisable RTL design | ✅ Yes | parameter WIDTH = 8; |
localparam | Local constant within module – cannot be overridden from outside | ✅ Yes | localparam IDLE = 2'b00; |
real | Floating-point – for simulation only | ❌ No | real freq = 1.5e9; |
time | Simulation time type | ❌ No | time t_start; |
4.3 always Block – The Core of RTL Design
The always block is the most fundamental construct in Verilog RTL design. There are two types used in synthesisable RTL:
Verilog — Sequential always Block (Flip-Flop)
// Sequential logic — infers flip-flops
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
q <= 8'h00; // Non-blocking: always use <= for sequential RTL
else
q <= d;
end
Verilog — Combinational always Block
// Combinational logic — @(*) ensures complete sensitivity list
always @(*) begin
case (sel)
2'b00: y = a; // Blocking: always use = for combinational RTL
2'b01: y = b;
2'b10: y = c;
default: y = d; // ALWAYS include default — prevents latch inference
endcase
end
⚠️ Critical RTL Rule: Blocking vs Non-Blocking AssignmentsThis is the most important rule in Verilog RTL design – and the most common source of bugs:
- Sequential logic (always @posedge clk):ALWAYS use non-blocking assignments (
<=) - Combinational logic (always @*):ALWAYS use blocking assignments (
=) - NEVER mix both types in the same always block
Mixing blocking and non-blocking assignments in sequential blocks is a simulator-synthesis mismatch – the simulation will pass but the synthesised hardware will behave differently.
5. VHDL Basics for RTL Design
VHDL (VHSIC Hardware Description Language) is the second major HDL used for RTL design. It is strongly typed, more verbose than Verilog, and is the dominant language for European aerospace, defence and automotive chip design (many companies following DO-254 or ISO 26262 mandate VHDL). Understanding VHDL RTL coding is essential for engineers working in these domains.
5.1 VHDL Entity and Architecture
In VHDL, every design unit consists of an entity (defines the interface – ports) and an architecture (defines the implementation – internal logic). This separation of interface and implementation is one of VHDL’s strengths for large-team RTL design.
VHDL — Entity and Architecture
-- VHDL RTL Design: 8-bit Register with Synchronous Reset
library IEEE;
use IEEE.STD_LOGIC_1164.ALL;
use IEEE.NUMERIC_STD.ALL;
-- Entity: defines ports (interface)
entity reg_8bit is
port (
clk : in std_logic;
rst_n : in std_logic;
en : in std_logic;
d : in std_logic_vector(7 downto 0);
q : out std_logic_vector(7 downto 0)
);
end entity reg_8bit;
-- Architecture: defines the implementation (behaviour)
architecture rtl of reg_8bit is
begin
-- Sequential process: infers D flip-flops
reg_proc : process(clk)
begin
if rising_edge(clk) then
if rst_n = '0' then
q <= (others => '0'); -- Synchronous reset
elsif en = '1' then
q <= d; -- Load data when enabled
end if;
end if;
end process reg_proc;
end architecture rtl;
5.2 VHDL Data Types for RTL Design
VHDL’s strong type system prevents many common RTL errors that are possible in Verilog. Key data types for synthesisable RTL design in VHDL include:
| VHDL Type | Description | RTL Use |
|---|---|---|
std_logic | 9-value logic type (U, X, 0, 1, Z, W, L, H, -) | Standard for all single-bit signals – must include IEEE.STD_LOGIC_1164 |
std_logic_vector | Array of std_logic – most common multi-bit type | Buses, data paths: std_logic_vector(7 downto 0) |
unsigned | Unsigned integer (from NUMERIC_STD) | Arithmetic operations – use for arithmetic signals |
signed | Signed two’s complement integer (from NUMERIC_STD) | Signed arithmetic operations |
integer | 32-bit integer – synthesisable with range constraint | integer range 0 to 255 – defines bit width |
boolean | True/false – synthesisable | Control signals, enable flags |
enumeration | User-defined enumerated types | FSM state encoding: type state_t is (IDLE, ACTIVE, DONE); |
5.3 VHDL Signal vs Variable
One of VHDL’s most commonly misunderstood concepts in RTL design is the difference between signals and variables:
| Aspect | Signal | Variable |
|---|---|---|
| Scope | Defined in architecture or package – visible everywhere in the architecture | Defined inside a process – local to that process only |
| Update timing | Updated at the end of the delta cycle (after process suspension) | Updated immediately when assigned |
| Synthesis result | Always maps to a wire or flip-flop | May or may not produce a register depending on usage context |
| RTL recommendation | Use for inter-process communication, output ports, and main data paths | Use for local intermediate calculations within a process |
| Assignment operator | <= (signal assignment) | := (variable assignment) |
6. Synthesisable vs Non-Synthesisable Constructs
One of the most critical skills in RTL design is knowing exactly which language constructs are synthesisable and which are not. Using non-synthesisable constructs in RTL code is a common beginner mistake that causes the synthesis tool to either error out, produce incorrect hardware, or silently ignore the construct – leading to simulation-synthesis mismatches that are extremely hard to debug.
Verilog: Synthesisable Constructs
| Construct | Synthesisable? | Infers |
|---|---|---|
assign (continuous assignment) | ✅ Yes | Combinational logic (wires and gates) |
always @(posedge clk) | ✅ Yes | Flip-flops (sequential logic) |
always @(*) | ✅ Yes | Combinational logic |
if-else in always | ✅ Yes | MUX (combinational) or conditional register update (sequential) |
case / casez / casex | ✅ Yes | MUX / decoder logic |
for loop with fixed bounds | ✅ Yes | Replicated logic (unrolled at compile time) |
module instantiation | ✅ Yes | Hierarchical structural design |
generate | ✅ Yes | Parameterisable repeated or conditional hardware structures |
parameter / localparam | ✅ Yes | Compile-time constants (no hardware) |
| Arithmetic operators (+, -, *, /) | ✅ Yes (*, / may be large) | Adder, subtractor, multiplier, divider circuits |
| Bitwise operators (&, |, ^, ~) | ✅ Yes | AND, OR, XOR, NOT gates |
| Reduction operators (&, |, ^) | ✅ Yes | Multi-input AND, OR, XOR reduction trees |
| Shift operators (<<, >>, <<<, >>>) | ✅ Yes | Barrel shifters or wire routing (constant shifts) |
| Concatenation { , } | ✅ Yes | Wire concatenation / bus manipulation |
| Conditional operator (? 🙂 | ✅ Yes | 2:1 MUX |
Verilog: Non-Synthesisable Constructs (Testbench / Simulation Only)
| Construct | Why Not Synthesisable | Use Instead |
|---|---|---|
#delay (e.g., #10 clk = ~clk;) | Time delays have no physical hardware equivalent in combinational/sequential logic | Use timing constraints in SDC |
initial block | Executes once at time 0 – no hardware equivalent (except FPGA with initialisation support) | Use synchronous reset for initialisation |
$display, $monitor, $finish | System tasks – simulator functions only | Not applicable in RTL – testbench only |
force / release | Simulator construct for overriding signals – no hardware equivalent | Testbench only |
fork / join | Parallel thread simulation – no hardware mapping | Testbench only |
Dynamic memory (new, queues, mailboxes) | SystemVerilog verification constructs – no hardware equivalent | SystemVerilog classes for testbench only |
File I/O ($fopen, $fread) | Operating system file access – no hardware equivalent | Testbench only |
Unbounded while loop | May execute indefinitely – synthesis cannot determine hardware size | Use for loop with fixed bounds |
wait statement | Event-based waiting – not synthesisable | Use synchronous state machines |
7. Writing Sequential Logic in RTL Design
Sequential logic is the type of digital logic that has memory – its output depends not only on current inputs but also on past states. In RTL design, sequential logic is implemented using flip-flops, which are inferred whenever a signal is assigned inside a clock-edge-triggered always block.
7.1 D Flip-Flop with Synchronous Reset
Verilog — D Flip-Flop (Synchronous Reset)
module dff_sync_rst (
input wire clk, rst_n, d,
output reg q
);
always @(posedge clk) begin
if (!rst_n) // Synchronous reset - checked at clock edge
q <= 1'b0;
else
q <= d;
end
endmodule
7.2 D Flip-Flop with Asynchronous Reset
Verilog – D Flip-Flop (Asynchronous Reset)
module dff_async_rst (
input wire clk, rst_n, d,
output reg q
);
// rst_n in sensitivity list = asynchronous reset
always @(posedge clk or negedge rst_n) begin
if (!rst_n) // Asynchronous — fires regardless of clock
q <= 1'b0;
else
q <= d;
end
endmodule
7.3 Parameterisable Shift Register
Verilog — Parameterisable Shift Register
module shift_reg #(
parameter WIDTH = 8,
parameter DEPTH = 4
) (
input wire clk, rst_n, en,
input wire [WIDTH-1:0] d_in,
output wire [WIDTH-1:0] d_out
);
reg [WIDTH-1:0] shift_mem [DEPTH-1:0];
integer i;
always @(posedge clk) begin
if (!rst_n) begin
for (i = 0; i < DEPTH; i = i + 1)
shift_mem[i] <= {WIDTH{1'b0}};
end else if (en) begin
shift_mem[0] <= d_in;
for (i = 1; i < DEPTH; i = i + 1)
shift_mem[i] <= shift_mem[i-1];
end
end
assign d_out = shift_mem[DEPTH-1];
endmodule
8. Writing Combinational Logic in RTL Design
Combinational logic produces outputs that depend only on the current values of its inputs – there is no memory, no clock. In RTL design, combinational logic is described using continuous assignments (assign statements) or combinational always @(*) blocks. The synthesis tool maps this into the appropriate logic gates from the standard cell library.
8.1 Priority Encoder Using always @(*)
Verilog — Priority Encoder (4-to-2)
module priority_enc_4to2 (
input wire [3:0] req, // 4-bit request bus
output reg [1:0] grant, // 2-bit grant output
output reg valid // Valid output
);
always @(*) begin
valid = 1'b1;
if (req[3]) grant = 2'b11; // Highest priority
else if (req[2]) grant = 2'b10;
else if (req[1]) grant = 2'b01;
else if (req[0]) grant = 2'b00;
else begin
grant = 2'b00;
valid = 1'b0; // No request active
end
end
endmodule
8.2 Avoiding Latch Inference in RTL Design
One of the most critical rules in combinational RTL design is to avoid unintentional latch inference. A latch is inferred when a signal in a combinational always block is not assigned in all possible code paths. Latches are timing-analysis nightmares — they are level-sensitive, not edge-triggered, which causes problems with static timing analysis and often indicates a design error.
❌ Latch Inferred (Bug)
always @(*) begin
if (sel)
y = a; // What is y when
// sel=0? Latch!
end
✅ No Latch – Correct RTL
always @(*) begin
if (sel)
y = a;
else
y = b; // All paths covered
end
❌ Incomplete case (Latch Bug)
always @(*) begin
case (opcode)
2'b00: result = a + b;
2'b01: result = a - b;
// Missing 2'b10, 2'b11!
// = Latch inferred
endcase
end
✅ Default covers all paths
always @(*) begin
case (opcode)
2'b00: result = a + b;
2'b01: result = a - b;
2'b10: result = a & b;
default: result = 8'h00;
// Default prevents latch
endcase
end
9. FSM (Finite State Machine) RTL Coding
Finite State Machines (FSMs) are one of the most fundamental building blocks in RTL design. Nearly every digital controller, protocol handler, sequencer, and arbiter in a chip is implemented as an FSM. Understanding how to code FSMs correctly in RTL is essential for every VLSI engineer.
9.1 Moore vs Mealy FSMs
| Aspect | Moore FSM | Mealy FSM |
|---|---|---|
| Output depends on | Current state only | Current state AND current inputs |
| Output timing | Registered (one cycle after state transition) | Combinational (immediate with input change) |
| Number of states | Typically more states needed | Typically fewer states needed |
| Glitching | No output glitching – outputs are registered | Possible input glitch → output glitch |
| Preferred for | Most RTL designs – safer, easier to verify | Protocol interfaces where immediate response needed |
9.2 Three-Process FSM Coding Style (Recommended for RTL Design)
The three-process FSM coding style is the industry best practice for RTL design – it separates state register, next-state logic, and output logic into three distinct always blocks for maximum clarity, maintainability, and synthesis friendliness:
Verilog – Three-Process FSM (Moore – UART TX Controller Example)
module uart_tx_ctrl (
input wire clk, rst_n,
input wire tx_start, // Start transmission
input wire tx_done, // Bit-level done
output reg tx_en, // Enable transmitter
output reg tx_busy // Busy flag
);
// State encoding using localparams
localparam [1:0]
IDLE = 2'b00,
START = 2'b01,
TRANSMIT = 2'b10,
STOP = 2'b11;
reg [1:0] curr_state, next_state;
// PROCESS 1: State Register (Sequential)
always @(posedge clk or negedge rst_n) begin
if (!rst_n)
curr_state <= IDLE;
else
curr_state <= next_state;
end
// PROCESS 2: Next-State Logic (Combinational)
always @(*) begin
next_state = curr_state; // Default: stay in current state
case (curr_state)
IDLE : if (tx_start) next_state = START;
START : next_state = TRANSMIT;
TRANSMIT: if (tx_done) next_state = STOP;
STOP : next_state = IDLE;
default: next_state = IDLE;
endcase
end
// PROCESS 3: Output Logic (Moore — based on state only)
always @(*) begin
// Default outputs to prevent latch inference
tx_en = 1'b0;
tx_busy = 1'b0;
case (curr_state)
IDLE : begin tx_en = 1'b0; tx_busy = 1'b0; end
START : begin tx_en = 1'b1; tx_busy = 1'b1; end
TRANSMIT: begin tx_en = 1'b1; tx_busy = 1'b1; end
STOP : begin tx_en = 1'b0; tx_busy = 1'b1; end
default : begin tx_en = 1'b0; tx_busy = 1'b0; end
endcase
end
endmodule
9.3 FSM State Encoding
| Encoding Style | Example (4 states) | Advantages | Disadvantages |
|---|---|---|---|
| Binary | 00, 01, 10, 11 | Minimum flip-flops (log2 N) | More complex next-state logic, slower |
| One-Hot | 0001, 0010, 0100, 1000 | Fastest next-state logic, used in high-speed designs | More flip-flops (N flip-flops) |
| Gray Code | 00, 01, 11, 10 | Only one bit changes per transition – good for CDC | More complex output decoding |
| Johnson | 1000, 1100, 1110, 1111 | Power efficient – limited bit switching | Requires special decoding |
10. Reset Strategies in RTL Design
Reset strategy is one of the most important architectural decisions in RTL design. The reset strategy affects power consumption, timing analysis, design robustness, and functional safety compliance. There are two fundamental types of reset:
| Reset Type | How It Works | Verilog Sensitivity List | Advantages | Disadvantages |
|---|---|---|---|---|
| Synchronous Reset | Reset is checked only at the active clock edge — reset must be held for at least one clock cycle | always @(posedge clk) | Clean STA — reset treated as data; no asynchronous paths; better for timing closure | Reset must be asserted for at least one full clock cycle; risk of missing reset if pulse too short |
| Asynchronous Reset | Reset immediately forces the flip-flop to its reset state regardless of the clock | always @(posedge clk or negedge rst_n) | Instantaneous reset; works even if clock is stopped; required in some power-domain scenarios | Introduces asynchronous timing paths (recovery/removal constraints); deassertion must be synchronised to clock to avoid metastability |
Asynchronous Reset with Synchronous Deassertion (Best Practice)
The industry best practice for safety-critical RTL design is to use asynchronous assertion (reset can fire anytime) but synchronous deassertion (release is synchronised to the clock edge). This is implemented using a reset synchroniser:
Verilog – Reset Synchroniser (Best Practice)
// Reset synchroniser — 2-stage synchroniser prevents metastability
// on reset deassertion
module rst_sync (
input wire clk,
input wire async_rst_n, // Async reset from power-on / button
output wire sync_rst_n // Synchronised reset to logic
);
reg [1:0] sync_ff;
always @(posedge clk or negedge async_rst_n) begin
if (!async_rst_n)
sync_ff <= 2'b00; // Async assertion propagates immediately
else
sync_ff <= {sync_ff[0], 1'b1}; // Sync deassertion through 2 FFs
end
assign sync_rst_n = sync_ff[1];
endmodule
11. Clock Domain Crossing (CDC) in RTL Design
Clock Domain Crossing (CDC) is one of the most critical and error-prone aspects of RTL design. A CDC occurs whenever a signal crosses from logic clocked by one clock domain to logic clocked by a different, unrelated (or asynchronous) clock domain. If not handled correctly, CDC violations cause metastability – a state where a flip-flop’s output is neither a clean logic 0 nor a logic 1 – which leads to unpredictable, intermittent chip failures that are extremely difficult to debug.
Types of CDC Signals and Their Solutions
| Signal Type | CDC Technique | Description |
|---|---|---|
| Single-bit control signal | 2-FF Synchroniser | Two back-to-back flip-flops in the destination domain – standard synchroniser for low-frequency single-bit signals |
| Single-bit pulse | Pulse synchroniser / Toggle synchroniser | Convert pulse to toggle, synchronise toggle, detect edge in destination domain |
| Multi-bit data bus | Async FIFO (Gray-coded pointers) | FIFO with independent read and write clocks, Gray-coded pointers synchronised across domains – most robust solution |
| Multi-bit control bus | Handshake protocol | Request/acknowledge handshake with synchronised enable signals |
| Multi-bit near-static data | MUX synchroniser / Enable pulse | Data changes only when a synchronised enable/sample pulse is asserted |
Verilog – 2-FF Synchroniser (Single Bit CDC)
module sync_2ff (
input wire clk_dst, // Destination clock
input wire rst_n,
input wire d_src, // Signal from source clock domain
output wire d_dst // Synchronised output in destination domain
);
reg ff1, ff2;
always @(posedge clk_dst or negedge rst_n) begin
if (!rst_n) begin
ff1 <= 1'b0;
ff2 <= 1'b0;
end else begin
ff1 <= d_src; // First FF: may go metastable
ff2 <= ff1; // Second FF: resolves metastability
end
end
assign d_dst = ff2; // Clean synchronised signal
endmodule
12. Parameterisation and Scalable RTL Design
Writing parameterisable RTL is a hallmark of professional RTL design. A parameterisable module can be reused across multiple projects, multiple chips, and multiple configurations without rewriting the code. Using parameter and localparam in Verilog (or generic in VHDL) to define bus widths, depths, and other configuration values is a fundamental best practice.
Verilog – Parameterisable FIFO Controller
module fifo_ctrl #(
parameter DATA_WIDTH = 8, // Configurable data width
parameter FIFO_DEPTH = 16, // Configurable depth
parameter ADDR_WIDTH = 4 // log2(FIFO_DEPTH)
) (
input wire clk, rst_n,
input wire wr_en, rd_en,
input wire [DATA_WIDTH-1:0] wr_data,
output reg [DATA_WIDTH-1:0] rd_data,
output wire full, empty
);
reg [DATA_WIDTH-1:0] mem [FIFO_DEPTH-1:0];
reg [ADDR_WIDTH:0] wr_ptr, rd_ptr; // Extra bit for full/empty detect
assign full = (wr_ptr == {~rd_ptr[ADDR_WIDTH], rd_ptr[ADDR_WIDTH-1:0]});
assign empty = (wr_ptr == rd_ptr);
always @(posedge clk or negedge rst_n) begin
if (!rst_n) begin
wr_ptr <= '0;
rd_ptr <= '0;
end else begin
if (wr_en && !full) begin
mem[wr_ptr[ADDR_WIDTH-1:0]] <= wr_data;
wr_ptr <= wr_ptr + 1'b1;
end
if (rd_en && !empty) begin
rd_data <= mem[rd_ptr[ADDR_WIDTH-1:0]];
rd_ptr <= rd_ptr + 1'b1;
end
end
end
endmodule
13. RTL Coding Guidelines and Best Practices
Professional RTL design follows strict coding guidelines that ensure the code is readable, maintainable, synthesisable, and produces predictable hardware. The following guidelines are used by top semiconductor companies and are also enforced by RTL lint tools like Synopsys SpyGlass and Mentor Questa Lint.
| # | Guideline | Reason |
|---|---|---|
| 1 | Always use <= for sequential and = for combinational | Prevents sim-synth mismatches |
| 2 | Always use @(*) (or always_comb) for combinational logic | Prevents incomplete sensitivity list latches |
| 3 | Always include default in case statements | Prevents latch inference |
| 4 | Always cover all branches in if-else for combinational | Prevents latch inference |
| 5 | Never use delays (#) in RTL | Not synthesisable |
| 6 | Never use initial blocks in synthesisable RTL | Not synthesisable (except FPGA RAM init) |
| 7 | Use parameter for all constants – no magic numbers | Readability and reusability |
| 8 | One clock domain per always block – never mix clocks | Prevents CDC issues |
| 9 | Name all flip-flops and state signals descriptively | Readability and debug |
| 10 | Add file header with author, date, description, revision | Documentation standard |
| 11 | Separate combinational and sequential logic into different always blocks | Cleaner synthesis, easier debug |
| 12 | Use consistent naming conventions (e.g., _n suffix for active-low signals) | Team readability |
| 13 | Avoid full-case without simulator/synthesis pragma – use explicit default | Synthesis tool portability |
| 14 | Keep module size manageable – < 500 lines per module | Synthesis and debug efficiency |
| 15 | Use hierarchical design – break large functions into sub-modules | Enables incremental synthesis and team collaboration |
14. RTL Lint Checks and Static Analysis
Before any RTL design is handed to the verification team or synthesis tool, it must pass lint analysis – an automated static analysis that checks the code for coding guideline violations, potential simulation-synthesis mismatches, and logic correctness issues. Lint is the first quality gate in the VLSI design flow after RTL coding.
Industry RTL Lint Tools
- Synopsys SpyGlass: The industry-leading RTL lint tool – checks for hundreds of coding quality rules, CDC issues, and synthesis issues
- Mentor Questa Lint (Siemens): Comprehensive lint and CDC analysis
- Cadence HAL: Cadence’s RTL analysis tool
- Verilator (open source): Fast Verilog linter and simulator – excellent for open-source RTL design
Common RTL Lint Violations
| Violation | Severity | Cause | Fix |
|---|---|---|---|
| Latch inferred | 🔴 Error | Incomplete always block coverage | Add default assignment or else clause |
| Incomplete sensitivity list | 🔴 Error | always @(a) instead of always @(*) | Use always @(*) or always_comb |
| Blocking in sequential block | 🔴 Error | = used inside always @(posedge clk) | Change to <= |
| Multiple drivers on net | 🔴 Error | Two always blocks drive same signal | Merge into single always block |
| Bit-width mismatch | 🟡 Warning | Assigning 8-bit to 4-bit without truncation intent | Explicit width matching or truncation |
| Undriven output | 🟡 Warning | Output port not assigned in all conditions | Assign default value |
| Unsynthesisable construct | 🔴 Error | #delay or initial in RTL | Remove or move to testbench |
| CDC violation | 🔴 Error | Multi-bit bus crossing clock domain without sync | Add async FIFO or handshake |
15. Writing RTL That Synthesises Well
Good RTL design is not just about functional correctness – it must also synthesise efficiently to meet timing, power, and area targets. The way you write RTL has a direct impact on the quality of the synthesised hardware. Here are the key principles for writing synthesis-friendly RTL:
Arithmetic Operations and Critical Path Awareness
Every arithmetic operation in RTL maps to physical logic gates. Addition, subtraction and comparisons are fast. Multiplication maps to dedicated multiplier cells or DSP blocks. Division is very slow and area-intensive – always try to replace division with right-shift (for powers of 2) or dedicated lookup-table-based solutions in timing-critical paths.
❌ Slow Critical Path
// Deep chain of operations
// = long critical path
always @(*) begin
result = (a + b) * c
/ d + e - f;
end
✅ Pipelined for Speed
// Break into pipeline stages
always @(posedge clk) begin
stage1 <= a + b; // Cycle 1
stage2 <= stage1
* c; // Cycle 2
result <= stage2
+ e - f; // Cycle 3
end
16. Common RTL Design Mistakes and How to Avoid Them
| # | Mistake | Consequence | How to Avoid |
|---|---|---|---|
| 1 | Using #delay in synthesisable RTL | Synthesis ignores delay – hardware behaves differently from simulation | Never use # in RTL – use SDC constraints for timing |
| 2 | Incomplete case / if-else (no default) | Latch inferred – RTL lint error, timing analysis failure | Always include default in case, else in if for combinational |
| 3 | Using blocking assignment in sequential logic | Race conditions – simulation passes but silicon fails | Always use <= in sequential always blocks |
| 4 | Missing CDC synchroniser on cross-domain signals | Metastability – intermittent failures in silicon | Always use 2-FF synchroniser or async FIFO for CDC |
| 5 | Gating the clock in RTL (using logic to control clock) | Clock glitches → data corruption; DTA/STA nightmare | Use clock enable signals with ICG (Integrated Clock Gating) cells |
| 6 | Reset fan-out too high (one reset driving millions of FFs) | Reset timing violations; reset doesn’t propagate | Use reset tree with buffers; let synthesis handle reset tree |
| 7 | Combinational loops (output feeds directly back to input with no register) | Simulation glitching, oscillation; unsynthesisable in most cases | Add a register to break every feedback path |
| 8 | Unintentional X propagation (uninitialized signals) | X-propagation – simulation passes but masks real bugs | Initialise all flip-flops via reset; use SVA X-checks |
| 9 | Over-complicated RTL in single module | Slow synthesis, hard to debug, poor reusability | Break into hierarchical sub-modules |
| 10 | Not running lint before handoff | RTL with latches, CDC issues, coding violations reaches synthesis | Make SpyGlass lint clean a mandatory RTL handoff criterion |
🎯 Key Takeaways – RTL Design
- RTL design describes digital hardware in terms of register transfers per clock cycle – it is hardware description, not software programming
- RTL code must be synthesisable – all non-synthesisable constructs (#delay, initial, $display) belong in testbenches only
- Use
<=(non-blocking) for sequential logic and=(blocking) for combinational – mixing causes simulation-synthesis mismatches - Always include
defaultin case statements andelsein combinational if-else to prevent latch inference - Use
always @(*)for all combinational logic to ensure complete sensitivity lists - FSM coding with three separate always blocks (state register, next-state, output) is the industry best practice
- CDC signals must always be synchronised – 2-FF synchroniser for single bits, async FIFO for multi-bit buses
- Use parameterised RTL modules for reusability across chips and projects
- Always run RTL lint (SpyGlass) before handoff to verification and synthesis
- Clean RTL = faster synthesis, better timing closure, fewer silicon respins
17. Frequently Asked Questions (FAQ)
Q1: What is RTL design in VLSI?
RTL design in VLSI stands for Register Transfer Level design. It is the process of describing a digital circuit’s functionality using Hardware Description Languages (Verilog, SystemVerilog, or VHDL) at a level of abstraction where the design is described in terms of data transfers between registers per clock cycle and the combinational logic that transforms that data. RTL design is the primary input for logic synthesis tools in the VLSI design flow and is the standard methodology for all digital IC design.
Q2: What is the difference between synthesisable and non-synthesisable Verilog?
Synthesisable Verilog constructs are those that logic synthesis tools (like Synopsys Design Compiler) can translate into actual hardware gates. These include: always blocks with clock or @(*), assign statements, if-else, case, for loops with fixed bounds, module instantiation, and arithmetic/bitwise operators. Non-synthesisable constructs are those used only for simulation: #delay, initial blocks, $display/$monitor/$finish system tasks, force/release, fork/join, and dynamic memory allocation. Using non-synthesisable constructs in RTL code causes the synthesis tool to either error out or produce incorrect hardware.
Q3: When should I use blocking vs non-blocking assignments in RTL design?
Use non-blocking assignments (<=) in ALL sequential (clock-edge triggered) always blocks – this ensures that all flip-flops update simultaneously at the clock edge, which is the correct hardware behaviour. Use blocking assignments (=) in ALL combinational (always @*) blocks – this ensures proper sequential evaluation of combinational logic within the block. Never mix both types in the same always block. This is arguably the single most important rule in Verilog RTL design.
Q4: What causes latch inference in RTL design and how to avoid it?
A latch is inferred in RTL when a signal in a combinational always block is not assigned under all possible input conditions. The synthesis tool assumes that if a signal is not assigned, it must “remember” its previous value – which implies a latch. To avoid latch inference: (1) Always include a default assignment at the beginning of combinational always blocks. (2) Always include a default case in case statements. (3) Always include an else clause in if-else statements within combinational logic. Running RTL lint (SpyGlass) will flag all latch inference warnings.
Q5: What is Clock Domain Crossing (CDC) and why is it important in RTL design?
Clock Domain Crossing (CDC) occurs when a signal must cross from logic driven by one clock to logic driven by a different, asynchronous clock. Without proper CDC handling, the receiving flip-flop may go metastable – its output becomes indeterminate, causing unpredictable chip behaviour. CDC is handled using synchronisers: a 2-FF synchroniser for single-bit signals, and an asynchronous FIFO (with Gray-coded pointers) for multi-bit data buses. CDC violations are a leading cause of first-silicon failures and must be verified using dedicated CDC analysis tools (SpyGlass CDC, Questa CDC) before tapeout.
Q6: What is the three-process FSM coding style?
The three-process FSM coding style separates the FSM implementation into three distinct always blocks: (1) The state register – a sequential always block that updates current_state to next_state on each clock edge. (2) The next-state logic – a combinational always block that determines the next state based on current state and inputs. (3) The output logic – a combinational always block that generates outputs based on the current state (Moore) or current state and inputs (Mealy). This style is the industry best practice because it is the most readable, most maintainable, and most synthesis-friendly way to code FSMs.
18. Conclusion
RTL design is the bedrock of every digital chip manufactured in the world. It is the stage where hardware engineers translate architecture specifications into concrete, synthesisable hardware descriptions using Verilog, SystemVerilog, or VHDL. The quality of the RTL code written at this stage directly determines the success of everything that follows in the VLSI design flow – from how quickly verification achieves coverage closure, to how efficiently logic synthesis produces a quality netlist, to whether the physical design team achieves timing closure without costly design changes.
In this comprehensive guide, we covered all aspects of professional RTL design: the definition and abstraction levels, Verilog and VHDL language fundamentals, synthesisable vs non-synthesisable constructs, sequential and combinational logic coding, FSM coding best practices, reset strategies, CDC handling, parameterisation, coding guidelines, lint checking, and the most common RTL mistakes that cost engineers time and companies money.
The key message is this: RTL design is hardware description. Every line of code you write becomes physical silicon. Write it with the care and precision that hardware demands – not the casualness that software sometimes permits. Clean, lint-free, well-structured, synthesisable RTL code is the foundation of every successful chip.
Discover more from PiEmbSysTech - Embedded Systems & VLSI Lab
Subscribe to get the latest posts sent to your email.
