RTL Design: Writing Synthesisable Verilog and VHDL (Complete Guide)

What this article covers: This is the most complete guide to RTL design available for VLSI engineers. You will learn what Register Transfer Level (RTL) means, how to write synthesisable Verilog and VHDL, which constructs are synthesisable and which are not, RTL coding best practices, FSM coding styles, reset strategies, clock domain crossing, MISRA and lint rules, and how RTL quality directly affects synthesis, timing closure and silicon success.

RTL design diagram showing 15 powerful rules for synthesisable Verilog and VHDL with FSM design, coding practices, and digital logic implementation

RTL design is the foundation of every digital chip ever manufactured. From the simplest IoT microcontroller to the most powerful AI accelerator, every piece of logic in every chip was described first as RTL code – written in Verilog, SystemVerilog, or VHDL – before any synthesis tool, physical design tool, or foundry process ever touched it. Getting RTL right is the single most important factor in a successful chip design. RTL bugs found after tapeout can cost millions of dollars and months of schedule. RTL bugs found in simulation cost hours.

In the VLSI design flow, the RTL design stage sits right after system specification and architecture definition. It is the stage where the hardware description language (HDL) code is written that defines exactly what the chip will do. This code must be synthesisable – meaning it must be written in a way that logic synthesis tools (Synopsys Design Compiler, Cadence Genus) can translate it correctly into a gate-level netlist. Non-synthesisable code may simulate correctly but will produce wrong or no hardware at all.

This complete guide teaches you how to write professional, synthesisable RTL design in both Verilog and VHDL, from first principles to advanced industry techniques.

Table of Contents

    1. What is RTL Design?

    RTL design stands for Register Transfer Level design. It is a method of describing digital hardware using a Hardware Description Language (HDL), where the description focuses on how data moves between registers (flip-flops) on each clock cycle, and what combinational logic transforms that data between those transfers. The term “register transfer level” captures both concepts: registers (the flip-flops that store state) and transfers (the movement and transformation of data between them, mediated by combinational logic).

    The RTL abstraction level sits between two other abstraction levels in digital design:

    Abstraction LevelDescriptionExampleUsed For
    Behavioural LevelDescribes what the system does algorithmically – no timing, no registers impliedif (a > b) max = a; else max = b;High-level synthesis input, early validation
    RTL Level ⬅ ThisDescribes data transfers between registers per clock cycle, with combinational logic betweenalways @(posedge clk) q <= d;Logic synthesis input — industry standard
    Gate LevelDescribes the circuit as actual logic gates (AND, OR, NAND, flip-flops) from a standard cell libraryNetlist of cell instances and connectionsPhysical design input, gate-level simulation
    Switch LevelDescribes transistor-level connectivity (NMOS/PMOS switches)SPICE netlistAnalog/custom IC design

    RTL design is the dominant design methodology in the semiconductor industry because it provides the right level of abstraction: detailed enough to be synthesised into hardware accurately, but abstract enough to allow designers to think in terms of functionality and architecture rather than transistor-level details. When you write RTL, you are describing hardware – not writing software. Every line of RTL code implies actual physical logic gates that will be manufactured on silicon.

    💡 RTL Design Is Hardware Description, Not ProgrammingThe most common mistake beginners make is treating RTL like software code. Unlike software, in RTL design: (1) All always blocks execute concurrently, not sequentially. (2) Every signal assignment implies physical wires and logic gates. (3) Timing is real – setup and hold time violations cause silicon failures. (4) You cannot “allocate memory” – storage is always explicit flip-flops or SRAM.

    2. RTL vs Behavioural vs Gate-Level Abstraction

    To understand RTL design deeply, you must understand how it differs from behavioural coding and gate-level coding. The same hardware can be described at all three levels, but only RTL is the standard input for logic synthesis tools in the industrial VLSI design flow.

    Consider a simple 2:1 multiplexer (MUX). Here is how it looks at all three levels:

    Behavioural Level (Not Directly Synthesisable for Timing-Critical Paths)

    Verilog — Behavioural

    // Behavioural - uses delay, not synthesisable in this form
    module mux_behav (
        input  a, b, sel,
        output reg y
    );
    always @(a or b or sel) begin
        #5 y = sel ? a : b;   // #5 delay - NOT synthesisable
    end
    endmodule

    RTL Level (Synthesisable – Industry Standard)

    Verilog — RTL (Synthesisable)

    // RTL — fully synthesisable, no delays, proper sensitivity list
    module mux_rtl (
        input  wire a, b, sel,
        output reg  y
    );
    always @(*) begin         // @(*) = complete sensitivity list
        if (sel)
            y = a;
        else
            y = b;
    end
    endmodule

    Gate Level (Post-Synthesis Netlist)

    Verilog – Gate Level (Generated by Synthesis Tool)

    // Gate-level - generated by synthesis tool, not written by hand
    module mux_gate (a, b, sel, y);
      input a, b, sel;
      output y;
      MX2_HVT U1 (.A(a), .B(b), .S(sel), .Y(y)); // Standard cell instance
    endmodule

    The RTL version is what you write as an engineer. The gate-level version is what the synthesis tool produces. Gate-level is never hand-coded for complex designs – that would take years. This illustrates exactly why RTL design is so powerful: one line of RTL can produce hundreds of gates automatically.

    3. Verilog vs VHDL vs SystemVerilog: Which to Use for RTL Design?

    There are three main HDLs used for RTL design in the industry today. Choosing the right one depends on your design domain, company preference, and the geographic region you work in.

    LanguageYearPrimary UseRegion / DomainStrengthsWeaknesses
    Verilog (IEEE 1364)1984RTL design, synthesisUSA, Asia (dominant)Concise syntax, C-like, easy to learn, very widely supportedWeaker type system, easier to write bugs
    VHDL (IEEE 1076)1987RTL design, simulationEurope, defence/aerospaceStrong typing, verbose but explicit, DoD/aerospace standardVerbose, steeper learning curve, slower to write
    SystemVerilog (IEEE 1800)2002RTL design + VerificationGlobal (increasingly dominant)Superset of Verilog + OOP verification features, UVM supportLarge language, complex features if misused

    💡 Industry Recommendation (2025)For RTL design:SystemVerilogis the modern standard – it gives you all of Verilog’s strengths with improved syntax, explicit data types, interfaces, and better synthesis features. Many leading companies (Qualcomm, Apple, NVIDIA, Google) use SystemVerilog for RTL. VHDL remains dominant in European aerospace and defence (DO-254). Pure Verilog is still widely used in legacy designs and academia.

    4. Verilog Basics for RTL Design

    Verilog is a Hardware Description Language that forms the backbone of RTL design in most of the semiconductor industry. Understanding its core constructs is essential before writing any synthesisable RTL code.

    4.1 Module and Port Declaration

    In Verilog, every design unit is a module. A module has ports (inputs and outputs) and internal logic. This is the basic building block of any RTL design.

    Verilog – Module Structure

    // Module declaration — best practice for RTL design
    module adder_8bit (
        input  wire [7:0] a,        // 8-bit input A
        input  wire [7:0] b,        // 8-bit input B
        input  wire        cin,     // Carry input
        output wire [7:0] sum,     // Sum output
        output wire        cout    // Carry out
    );
        // Continuous assignment — combinational logic
        assign {cout, sum} = a + b + cin;
    
    endmodule

    4.2 Verilog Data Types for RTL Design

    Understanding data types is critical for writing correct, synthesisable RTL design in Verilog:

    Data TypeUse in RTLSynthesisable?Example
    wireConnects module ports and continuous assignments – represents physical wires✅ Yeswire [7:0] data_bus;
    regHolds a value in procedural (always) blocks – does NOT always imply a flip-flop! Can be combinational or sequential depending on usage✅ Yesreg [3:0] count;
    integer32-bit signed – useful in for loops (testbench or generate)⚠️ Avoid in synthesisable RTLinteger i;
    parameterCompile-time constant – essential for parameterisable RTL design✅ Yesparameter WIDTH = 8;
    localparamLocal constant within module – cannot be overridden from outside✅ Yeslocalparam IDLE = 2'b00;
    realFloating-point – for simulation only❌ Noreal freq = 1.5e9;
    timeSimulation time type❌ Notime t_start;

    4.3 always Block – The Core of RTL Design

    The always block is the most fundamental construct in Verilog RTL design. There are two types used in synthesisable RTL:

    Verilog — Sequential always Block (Flip-Flop)

    // Sequential logic — infers flip-flops
    always @(posedge clk or negedge rst_n) begin
        if (!rst_n)
            q <= 8'h00;     // Non-blocking: always use <= for sequential RTL
        else
            q <= d;
    end

    Verilog — Combinational always Block

    // Combinational logic — @(*) ensures complete sensitivity list
    always @(*) begin
        case (sel)
            2'b00: y = a;    // Blocking: always use = for combinational RTL
            2'b01: y = b;
            2'b10: y = c;
            default: y = d; // ALWAYS include default — prevents latch inference
        endcase
    end

    ⚠️ Critical RTL Rule: Blocking vs Non-Blocking AssignmentsThis is the most important rule in Verilog RTL design – and the most common source of bugs:

    • Sequential logic (always @posedge clk):ALWAYS use non-blocking assignments (<=)
    • Combinational logic (always @*):ALWAYS use blocking assignments (=)
    • NEVER mix both types in the same always block

    Mixing blocking and non-blocking assignments in sequential blocks is a simulator-synthesis mismatch – the simulation will pass but the synthesised hardware will behave differently.

    5. VHDL Basics for RTL Design

    VHDL (VHSIC Hardware Description Language) is the second major HDL used for RTL design. It is strongly typed, more verbose than Verilog, and is the dominant language for European aerospace, defence and automotive chip design (many companies following DO-254 or ISO 26262 mandate VHDL). Understanding VHDL RTL coding is essential for engineers working in these domains.

    5.1 VHDL Entity and Architecture

    In VHDL, every design unit consists of an entity (defines the interface – ports) and an architecture (defines the implementation – internal logic). This separation of interface and implementation is one of VHDL’s strengths for large-team RTL design.

    VHDL — Entity and Architecture

    -- VHDL RTL Design: 8-bit Register with Synchronous Reset
    library IEEE;
    use IEEE.STD_LOGIC_1164.ALL;
    use IEEE.NUMERIC_STD.ALL;
    
    -- Entity: defines ports (interface)
    entity reg_8bit is
        port (
            clk   : in  std_logic;
            rst_n : in  std_logic;
            en    : in  std_logic;
            d     : in  std_logic_vector(7 downto 0);
            q     : out std_logic_vector(7 downto 0)
        );
    end entity reg_8bit;
    
    -- Architecture: defines the implementation (behaviour)
    architecture rtl of reg_8bit is
    begin
        -- Sequential process: infers D flip-flops
        reg_proc : process(clk)
        begin
            if rising_edge(clk) then
                if rst_n = '0' then
                    q <= (others => '0');   -- Synchronous reset
                elsif en = '1' then
                    q <= d;                 -- Load data when enabled
                end if;
            end if;
        end process reg_proc;
    end architecture rtl;

    5.2 VHDL Data Types for RTL Design

    VHDL’s strong type system prevents many common RTL errors that are possible in Verilog. Key data types for synthesisable RTL design in VHDL include:

    VHDL TypeDescriptionRTL Use
    std_logic9-value logic type (U, X, 0, 1, Z, W, L, H, -)Standard for all single-bit signals – must include IEEE.STD_LOGIC_1164
    std_logic_vectorArray of std_logic – most common multi-bit typeBuses, data paths: std_logic_vector(7 downto 0)
    unsignedUnsigned integer (from NUMERIC_STD)Arithmetic operations – use for arithmetic signals
    signedSigned two’s complement integer (from NUMERIC_STD)Signed arithmetic operations
    integer32-bit integer – synthesisable with range constraintinteger range 0 to 255 – defines bit width
    booleanTrue/false – synthesisableControl signals, enable flags
    enumerationUser-defined enumerated typesFSM state encoding: type state_t is (IDLE, ACTIVE, DONE);

    5.3 VHDL Signal vs Variable

    One of VHDL’s most commonly misunderstood concepts in RTL design is the difference between signals and variables:

    AspectSignalVariable
    ScopeDefined in architecture or package – visible everywhere in the architectureDefined inside a process – local to that process only
    Update timingUpdated at the end of the delta cycle (after process suspension)Updated immediately when assigned
    Synthesis resultAlways maps to a wire or flip-flopMay or may not produce a register depending on usage context
    RTL recommendationUse for inter-process communication, output ports, and main data pathsUse for local intermediate calculations within a process
    Assignment operator<= (signal assignment):= (variable assignment)

    6. Synthesisable vs Non-Synthesisable Constructs

    One of the most critical skills in RTL design is knowing exactly which language constructs are synthesisable and which are not. Using non-synthesisable constructs in RTL code is a common beginner mistake that causes the synthesis tool to either error out, produce incorrect hardware, or silently ignore the construct – leading to simulation-synthesis mismatches that are extremely hard to debug.

    Verilog: Synthesisable Constructs

    ConstructSynthesisable?Infers
    assign (continuous assignment)✅ YesCombinational logic (wires and gates)
    always @(posedge clk)✅ YesFlip-flops (sequential logic)
    always @(*)✅ YesCombinational logic
    if-else in always✅ YesMUX (combinational) or conditional register update (sequential)
    case / casez / casex✅ YesMUX / decoder logic
    for loop with fixed bounds✅ YesReplicated logic (unrolled at compile time)
    module instantiation✅ YesHierarchical structural design
    generate✅ YesParameterisable repeated or conditional hardware structures
    parameter / localparam✅ YesCompile-time constants (no hardware)
    Arithmetic operators (+, -, *, /)✅ Yes (*, / may be large)Adder, subtractor, multiplier, divider circuits
    Bitwise operators (&, |, ^, ~)✅ YesAND, OR, XOR, NOT gates
    Reduction operators (&, |, ^)✅ YesMulti-input AND, OR, XOR reduction trees
    Shift operators (<<, >>, <<<, >>>)✅ YesBarrel shifters or wire routing (constant shifts)
    Concatenation { , }✅ YesWire concatenation / bus manipulation
    Conditional operator (? 🙂✅ Yes2:1 MUX

    Verilog: Non-Synthesisable Constructs (Testbench / Simulation Only)

    ConstructWhy Not SynthesisableUse Instead
    #delay (e.g., #10 clk = ~clk;)Time delays have no physical hardware equivalent in combinational/sequential logicUse timing constraints in SDC
    initial blockExecutes once at time 0 – no hardware equivalent (except FPGA with initialisation support)Use synchronous reset for initialisation
    $display, $monitor, $finishSystem tasks – simulator functions onlyNot applicable in RTL – testbench only
    force / releaseSimulator construct for overriding signals – no hardware equivalentTestbench only
    fork / joinParallel thread simulation – no hardware mappingTestbench only
    Dynamic memory (new, queues, mailboxes)SystemVerilog verification constructs – no hardware equivalentSystemVerilog classes for testbench only
    File I/O ($fopen, $fread)Operating system file access – no hardware equivalentTestbench only
    Unbounded while loopMay execute indefinitely – synthesis cannot determine hardware sizeUse for loop with fixed bounds
    wait statementEvent-based waiting – not synthesisableUse synchronous state machines

    7. Writing Sequential Logic in RTL Design

    Sequential logic is the type of digital logic that has memory – its output depends not only on current inputs but also on past states. In RTL design, sequential logic is implemented using flip-flops, which are inferred whenever a signal is assigned inside a clock-edge-triggered always block.

    7.1 D Flip-Flop with Synchronous Reset

    Verilog — D Flip-Flop (Synchronous Reset)

    module dff_sync_rst (
        input  wire clk, rst_n, d,
        output reg  q
    );
        always @(posedge clk) begin
            if (!rst_n)    // Synchronous reset - checked at clock edge
                q <= 1'b0;
            else
                q <= d;
        end
    endmodule

    7.2 D Flip-Flop with Asynchronous Reset

    Verilog – D Flip-Flop (Asynchronous Reset)

    module dff_async_rst (
        input  wire clk, rst_n, d,
        output reg  q
    );
        // rst_n in sensitivity list = asynchronous reset
        always @(posedge clk or negedge rst_n) begin
            if (!rst_n)    // Asynchronous — fires regardless of clock
                q <= 1'b0;
            else
                q <= d;
        end
    endmodule

    7.3 Parameterisable Shift Register

    Verilog — Parameterisable Shift Register

    module shift_reg #(
        parameter WIDTH = 8,
        parameter DEPTH = 4
    ) (
        input  wire              clk, rst_n, en,
        input  wire [WIDTH-1:0] d_in,
        output wire [WIDTH-1:0] d_out
    );
        reg [WIDTH-1:0] shift_mem [DEPTH-1:0];
        integer i;
    
        always @(posedge clk) begin
            if (!rst_n) begin
                for (i = 0; i < DEPTH; i = i + 1)
                    shift_mem[i] <= {WIDTH{1'b0}};
            end else if (en) begin
                shift_mem[0] <= d_in;
                for (i = 1; i < DEPTH; i = i + 1)
                    shift_mem[i] <= shift_mem[i-1];
            end
        end
        assign d_out = shift_mem[DEPTH-1];
    endmodule

    8. Writing Combinational Logic in RTL Design

    Combinational logic produces outputs that depend only on the current values of its inputs – there is no memory, no clock. In RTL design, combinational logic is described using continuous assignments (assign statements) or combinational always @(*) blocks. The synthesis tool maps this into the appropriate logic gates from the standard cell library.

    8.1 Priority Encoder Using always @(*)

    Verilog — Priority Encoder (4-to-2)

    module priority_enc_4to2 (
        input  wire [3:0] req,     // 4-bit request bus
        output reg  [1:0] grant,   // 2-bit grant output
        output reg          valid    // Valid output
    );
        always @(*) begin
            valid = 1'b1;
            if      (req[3]) grant = 2'b11;  // Highest priority
            else if (req[2]) grant = 2'b10;
            else if (req[1]) grant = 2'b01;
            else if (req[0]) grant = 2'b00;
            else begin
                grant = 2'b00;
                valid = 1'b0;              // No request active
            end
        end
    endmodule

    8.2 Avoiding Latch Inference in RTL Design

    One of the most critical rules in combinational RTL design is to avoid unintentional latch inference. A latch is inferred when a signal in a combinational always block is not assigned in all possible code paths. Latches are timing-analysis nightmares — they are level-sensitive, not edge-triggered, which causes problems with static timing analysis and often indicates a design error.

    ❌ Latch Inferred (Bug)

    always @(*) begin
      if (sel)
        y = a;    // What is y when
                  // sel=0? Latch!
    end

    ✅ No Latch – Correct RTL

    always @(*) begin
      if (sel)
        y = a;
      else
        y = b;  // All paths covered
    end

    ❌ Incomplete case (Latch Bug)

    always @(*) begin
      case (opcode)
        2'b00: result = a + b;
        2'b01: result = a - b;
        // Missing 2'b10, 2'b11!
        // = Latch inferred
      endcase
    end

    ✅ Default covers all paths

    always @(*) begin
      case (opcode)
        2'b00: result = a + b;
        2'b01: result = a - b;
        2'b10: result = a & b;
        default: result = 8'h00;
        // Default prevents latch
      endcase
    end

    9. FSM (Finite State Machine) RTL Coding

    Finite State Machines (FSMs) are one of the most fundamental building blocks in RTL design. Nearly every digital controller, protocol handler, sequencer, and arbiter in a chip is implemented as an FSM. Understanding how to code FSMs correctly in RTL is essential for every VLSI engineer.

    9.1 Moore vs Mealy FSMs

    AspectMoore FSMMealy FSM
    Output depends onCurrent state onlyCurrent state AND current inputs
    Output timingRegistered (one cycle after state transition)Combinational (immediate with input change)
    Number of statesTypically more states neededTypically fewer states needed
    GlitchingNo output glitching – outputs are registeredPossible input glitch → output glitch
    Preferred forMost RTL designs – safer, easier to verifyProtocol interfaces where immediate response needed

    The three-process FSM coding style is the industry best practice for RTL design – it separates state register, next-state logic, and output logic into three distinct always blocks for maximum clarity, maintainability, and synthesis friendliness:

    Verilog – Three-Process FSM (Moore – UART TX Controller Example)

    module uart_tx_ctrl (
        input  wire       clk, rst_n,
        input  wire       tx_start,   // Start transmission
        input  wire       tx_done,    // Bit-level done
        output reg        tx_en,      // Enable transmitter
        output reg        tx_busy     // Busy flag
    );
    
        // State encoding using localparams
        localparam [1:0]
            IDLE     = 2'b00,
            START    = 2'b01,
            TRANSMIT = 2'b10,
            STOP     = 2'b11;
    
        reg [1:0] curr_state, next_state;
    
        // PROCESS 1: State Register (Sequential)
        always @(posedge clk or negedge rst_n) begin
            if (!rst_n)
                curr_state <= IDLE;
            else
                curr_state <= next_state;
        end
    
        // PROCESS 2: Next-State Logic (Combinational)
        always @(*) begin
            next_state = curr_state; // Default: stay in current state
            case (curr_state)
                IDLE    : if (tx_start) next_state = START;
                START   :               next_state = TRANSMIT;
                TRANSMIT: if (tx_done)  next_state = STOP;
                STOP    :               next_state = IDLE;
                default:               next_state = IDLE;
            endcase
        end
    
        // PROCESS 3: Output Logic (Moore — based on state only)
        always @(*) begin
            // Default outputs to prevent latch inference
            tx_en   = 1'b0;
            tx_busy = 1'b0;
            case (curr_state)
                IDLE    : begin tx_en = 1'b0; tx_busy = 1'b0; end
                START   : begin tx_en = 1'b1; tx_busy = 1'b1; end
                TRANSMIT: begin tx_en = 1'b1; tx_busy = 1'b1; end
                STOP    : begin tx_en = 1'b0; tx_busy = 1'b1; end
                default : begin tx_en = 1'b0; tx_busy = 1'b0; end
            endcase
        end
    
    endmodule

    9.3 FSM State Encoding

    Encoding StyleExample (4 states)AdvantagesDisadvantages
    Binary00, 01, 10, 11Minimum flip-flops (log2 N)More complex next-state logic, slower
    One-Hot0001, 0010, 0100, 1000Fastest next-state logic, used in high-speed designsMore flip-flops (N flip-flops)
    Gray Code00, 01, 11, 10Only one bit changes per transition – good for CDCMore complex output decoding
    Johnson1000, 1100, 1110, 1111Power efficient – limited bit switchingRequires special decoding

    10. Reset Strategies in RTL Design

    Reset strategy is one of the most important architectural decisions in RTL design. The reset strategy affects power consumption, timing analysis, design robustness, and functional safety compliance. There are two fundamental types of reset:

    Reset TypeHow It WorksVerilog Sensitivity ListAdvantagesDisadvantages
    Synchronous ResetReset is checked only at the active clock edge — reset must be held for at least one clock cyclealways @(posedge clk)Clean STA — reset treated as data; no asynchronous paths; better for timing closureReset must be asserted for at least one full clock cycle; risk of missing reset if pulse too short
    Asynchronous ResetReset immediately forces the flip-flop to its reset state regardless of the clockalways @(posedge clk or negedge rst_n)Instantaneous reset; works even if clock is stopped; required in some power-domain scenariosIntroduces asynchronous timing paths (recovery/removal constraints); deassertion must be synchronised to clock to avoid metastability

    Asynchronous Reset with Synchronous Deassertion (Best Practice)

    The industry best practice for safety-critical RTL design is to use asynchronous assertion (reset can fire anytime) but synchronous deassertion (release is synchronised to the clock edge). This is implemented using a reset synchroniser:

    Verilog – Reset Synchroniser (Best Practice)

    // Reset synchroniser — 2-stage synchroniser prevents metastability
    // on reset deassertion
    module rst_sync (
        input  wire clk,
        input  wire async_rst_n,   // Async reset from power-on / button
        output wire sync_rst_n     // Synchronised reset to logic
    );
        reg [1:0] sync_ff;
    
        always @(posedge clk or negedge async_rst_n) begin
            if (!async_rst_n)
                sync_ff <= 2'b00;   // Async assertion propagates immediately
            else
                sync_ff <= {sync_ff[0], 1'b1}; // Sync deassertion through 2 FFs
        end
        assign sync_rst_n = sync_ff[1];
    endmodule

    11. Clock Domain Crossing (CDC) in RTL Design

    Clock Domain Crossing (CDC) is one of the most critical and error-prone aspects of RTL design. A CDC occurs whenever a signal crosses from logic clocked by one clock domain to logic clocked by a different, unrelated (or asynchronous) clock domain. If not handled correctly, CDC violations cause metastability – a state where a flip-flop’s output is neither a clean logic 0 nor a logic 1 – which leads to unpredictable, intermittent chip failures that are extremely difficult to debug.

    Types of CDC Signals and Their Solutions

    Signal TypeCDC TechniqueDescription
    Single-bit control signal2-FF SynchroniserTwo back-to-back flip-flops in the destination domain – standard synchroniser for low-frequency single-bit signals
    Single-bit pulsePulse synchroniser / Toggle synchroniserConvert pulse to toggle, synchronise toggle, detect edge in destination domain
    Multi-bit data busAsync FIFO (Gray-coded pointers)FIFO with independent read and write clocks, Gray-coded pointers synchronised across domains – most robust solution
    Multi-bit control busHandshake protocolRequest/acknowledge handshake with synchronised enable signals
    Multi-bit near-static dataMUX synchroniser / Enable pulseData changes only when a synchronised enable/sample pulse is asserted

    Verilog – 2-FF Synchroniser (Single Bit CDC)

    module sync_2ff (
        input  wire clk_dst,   // Destination clock
        input  wire rst_n,
        input  wire d_src,    // Signal from source clock domain
        output wire d_dst     // Synchronised output in destination domain
    );
        reg ff1, ff2;
    
        always @(posedge clk_dst or negedge rst_n) begin
            if (!rst_n) begin
                ff1 <= 1'b0;
                ff2 <= 1'b0;
            end else begin
                ff1 <= d_src;    // First FF: may go metastable
                ff2 <= ff1;      // Second FF: resolves metastability
            end
        end
        assign d_dst = ff2;   // Clean synchronised signal
    endmodule

    12. Parameterisation and Scalable RTL Design

    Writing parameterisable RTL is a hallmark of professional RTL design. A parameterisable module can be reused across multiple projects, multiple chips, and multiple configurations without rewriting the code. Using parameter and localparam in Verilog (or generic in VHDL) to define bus widths, depths, and other configuration values is a fundamental best practice.

    Verilog – Parameterisable FIFO Controller

    module fifo_ctrl #(
        parameter DATA_WIDTH = 8,   // Configurable data width
        parameter FIFO_DEPTH = 16,  // Configurable depth
        parameter ADDR_WIDTH = 4    // log2(FIFO_DEPTH)
    ) (
        input  wire                  clk, rst_n,
        input  wire                  wr_en, rd_en,
        input  wire [DATA_WIDTH-1:0] wr_data,
        output reg  [DATA_WIDTH-1:0] rd_data,
        output wire                  full, empty
    );
        reg [DATA_WIDTH-1:0] mem [FIFO_DEPTH-1:0];
        reg [ADDR_WIDTH:0]   wr_ptr, rd_ptr;  // Extra bit for full/empty detect
    
        assign full  = (wr_ptr == {~rd_ptr[ADDR_WIDTH], rd_ptr[ADDR_WIDTH-1:0]});
        assign empty = (wr_ptr == rd_ptr);
    
        always @(posedge clk or negedge rst_n) begin
            if (!rst_n) begin
                wr_ptr <= '0;
                rd_ptr <= '0;
            end else begin
                if (wr_en && !full) begin
                    mem[wr_ptr[ADDR_WIDTH-1:0]] <= wr_data;
                    wr_ptr <= wr_ptr + 1'b1;
                end
                if (rd_en && !empty) begin
                    rd_data <= mem[rd_ptr[ADDR_WIDTH-1:0]];
                    rd_ptr  <= rd_ptr + 1'b1;
                end
            end
        end
    endmodule

    13. RTL Coding Guidelines and Best Practices

    Professional RTL design follows strict coding guidelines that ensure the code is readable, maintainable, synthesisable, and produces predictable hardware. The following guidelines are used by top semiconductor companies and are also enforced by RTL lint tools like Synopsys SpyGlass and Mentor Questa Lint.

    #GuidelineReason
    1Always use <= for sequential and = for combinationalPrevents sim-synth mismatches
    2Always use @(*) (or always_comb) for combinational logicPrevents incomplete sensitivity list latches
    3Always include default in case statementsPrevents latch inference
    4Always cover all branches in if-else for combinationalPrevents latch inference
    5Never use delays (#) in RTLNot synthesisable
    6Never use initial blocks in synthesisable RTLNot synthesisable (except FPGA RAM init)
    7Use parameter for all constants – no magic numbersReadability and reusability
    8One clock domain per always block – never mix clocksPrevents CDC issues
    9Name all flip-flops and state signals descriptivelyReadability and debug
    10Add file header with author, date, description, revisionDocumentation standard
    11Separate combinational and sequential logic into different always blocksCleaner synthesis, easier debug
    12Use consistent naming conventions (e.g., _n suffix for active-low signals)Team readability
    13Avoid full-case without simulator/synthesis pragma – use explicit defaultSynthesis tool portability
    14Keep module size manageable – < 500 lines per moduleSynthesis and debug efficiency
    15Use hierarchical design – break large functions into sub-modulesEnables incremental synthesis and team collaboration

    14. RTL Lint Checks and Static Analysis

    Before any RTL design is handed to the verification team or synthesis tool, it must pass lint analysis – an automated static analysis that checks the code for coding guideline violations, potential simulation-synthesis mismatches, and logic correctness issues. Lint is the first quality gate in the VLSI design flow after RTL coding.

    Industry RTL Lint Tools

    • Synopsys SpyGlass: The industry-leading RTL lint tool – checks for hundreds of coding quality rules, CDC issues, and synthesis issues
    • Mentor Questa Lint (Siemens): Comprehensive lint and CDC analysis
    • Cadence HAL: Cadence’s RTL analysis tool
    • Verilator (open source): Fast Verilog linter and simulator – excellent for open-source RTL design

    Common RTL Lint Violations

    ViolationSeverityCauseFix
    Latch inferred🔴 ErrorIncomplete always block coverageAdd default assignment or else clause
    Incomplete sensitivity list🔴 Erroralways @(a) instead of always @(*)Use always @(*) or always_comb
    Blocking in sequential block🔴 Error= used inside always @(posedge clk)Change to <=
    Multiple drivers on net🔴 ErrorTwo always blocks drive same signalMerge into single always block
    Bit-width mismatch🟡 WarningAssigning 8-bit to 4-bit without truncation intentExplicit width matching or truncation
    Undriven output🟡 WarningOutput port not assigned in all conditionsAssign default value
    Unsynthesisable construct🔴 Error#delay or initial in RTLRemove or move to testbench
    CDC violation🔴 ErrorMulti-bit bus crossing clock domain without syncAdd async FIFO or handshake

    15. Writing RTL That Synthesises Well

    Good RTL design is not just about functional correctness – it must also synthesise efficiently to meet timing, power, and area targets. The way you write RTL has a direct impact on the quality of the synthesised hardware. Here are the key principles for writing synthesis-friendly RTL:

    Arithmetic Operations and Critical Path Awareness

    Every arithmetic operation in RTL maps to physical logic gates. Addition, subtraction and comparisons are fast. Multiplication maps to dedicated multiplier cells or DSP blocks. Division is very slow and area-intensive – always try to replace division with right-shift (for powers of 2) or dedicated lookup-table-based solutions in timing-critical paths.

    ❌ Slow Critical Path

    // Deep chain of operations
    // = long critical path
    always @(*) begin
      result = (a + b) * c
               / d + e - f;
    end

    ✅ Pipelined for Speed

    // Break into pipeline stages
    always @(posedge clk) begin
      stage1 <= a + b;   // Cycle 1
      stage2 <= stage1
                * c;     // Cycle 2
      result <= stage2
                + e - f; // Cycle 3
    end

    16. Common RTL Design Mistakes and How to Avoid Them

    #MistakeConsequenceHow to Avoid
    1Using #delay in synthesisable RTLSynthesis ignores delay – hardware behaves differently from simulationNever use # in RTL – use SDC constraints for timing
    2Incomplete case / if-else (no default)Latch inferred – RTL lint error, timing analysis failureAlways include default in case, else in if for combinational
    3Using blocking assignment in sequential logicRace conditions – simulation passes but silicon failsAlways use <= in sequential always blocks
    4Missing CDC synchroniser on cross-domain signalsMetastability – intermittent failures in siliconAlways use 2-FF synchroniser or async FIFO for CDC
    5Gating the clock in RTL (using logic to control clock)Clock glitches → data corruption; DTA/STA nightmareUse clock enable signals with ICG (Integrated Clock Gating) cells
    6Reset fan-out too high (one reset driving millions of FFs)Reset timing violations; reset doesn’t propagateUse reset tree with buffers; let synthesis handle reset tree
    7Combinational loops (output feeds directly back to input with no register)Simulation glitching, oscillation; unsynthesisable in most casesAdd a register to break every feedback path
    8Unintentional X propagation (uninitialized signals)X-propagation – simulation passes but masks real bugsInitialise all flip-flops via reset; use SVA X-checks
    9Over-complicated RTL in single moduleSlow synthesis, hard to debug, poor reusabilityBreak into hierarchical sub-modules
    10Not running lint before handoffRTL with latches, CDC issues, coding violations reaches synthesisMake SpyGlass lint clean a mandatory RTL handoff criterion

    🎯 Key Takeaways – RTL Design

    • RTL design describes digital hardware in terms of register transfers per clock cycle – it is hardware description, not software programming
    • RTL code must be synthesisable – all non-synthesisable constructs (#delay, initial, $display) belong in testbenches only
    • Use <= (non-blocking) for sequential logic and = (blocking) for combinational – mixing causes simulation-synthesis mismatches
    • Always include default in case statements and else in combinational if-else to prevent latch inference
    • Use always @(*) for all combinational logic to ensure complete sensitivity lists
    • FSM coding with three separate always blocks (state register, next-state, output) is the industry best practice
    • CDC signals must always be synchronised – 2-FF synchroniser for single bits, async FIFO for multi-bit buses
    • Use parameterised RTL modules for reusability across chips and projects
    • Always run RTL lint (SpyGlass) before handoff to verification and synthesis
    • Clean RTL = faster synthesis, better timing closure, fewer silicon respins

    17. Frequently Asked Questions (FAQ)

    Q1: What is RTL design in VLSI?

    RTL design in VLSI stands for Register Transfer Level design. It is the process of describing a digital circuit’s functionality using Hardware Description Languages (Verilog, SystemVerilog, or VHDL) at a level of abstraction where the design is described in terms of data transfers between registers per clock cycle and the combinational logic that transforms that data. RTL design is the primary input for logic synthesis tools in the VLSI design flow and is the standard methodology for all digital IC design.

    Q2: What is the difference between synthesisable and non-synthesisable Verilog?

    Synthesisable Verilog constructs are those that logic synthesis tools (like Synopsys Design Compiler) can translate into actual hardware gates. These include: always blocks with clock or @(*), assign statements, if-else, case, for loops with fixed bounds, module instantiation, and arithmetic/bitwise operators. Non-synthesisable constructs are those used only for simulation: #delay, initial blocks, $display/$monitor/$finish system tasks, force/release, fork/join, and dynamic memory allocation. Using non-synthesisable constructs in RTL code causes the synthesis tool to either error out or produce incorrect hardware.

    Q3: When should I use blocking vs non-blocking assignments in RTL design?

    Use non-blocking assignments (<=) in ALL sequential (clock-edge triggered) always blocks – this ensures that all flip-flops update simultaneously at the clock edge, which is the correct hardware behaviour. Use blocking assignments (=) in ALL combinational (always @*) blocks – this ensures proper sequential evaluation of combinational logic within the block. Never mix both types in the same always block. This is arguably the single most important rule in Verilog RTL design.

    Q4: What causes latch inference in RTL design and how to avoid it?

    A latch is inferred in RTL when a signal in a combinational always block is not assigned under all possible input conditions. The synthesis tool assumes that if a signal is not assigned, it must “remember” its previous value – which implies a latch. To avoid latch inference: (1) Always include a default assignment at the beginning of combinational always blocks. (2) Always include a default case in case statements. (3) Always include an else clause in if-else statements within combinational logic. Running RTL lint (SpyGlass) will flag all latch inference warnings.

    Q5: What is Clock Domain Crossing (CDC) and why is it important in RTL design?

    Clock Domain Crossing (CDC) occurs when a signal must cross from logic driven by one clock to logic driven by a different, asynchronous clock. Without proper CDC handling, the receiving flip-flop may go metastable – its output becomes indeterminate, causing unpredictable chip behaviour. CDC is handled using synchronisers: a 2-FF synchroniser for single-bit signals, and an asynchronous FIFO (with Gray-coded pointers) for multi-bit data buses. CDC violations are a leading cause of first-silicon failures and must be verified using dedicated CDC analysis tools (SpyGlass CDC, Questa CDC) before tapeout.

    Q6: What is the three-process FSM coding style?

    The three-process FSM coding style separates the FSM implementation into three distinct always blocks: (1) The state register – a sequential always block that updates current_state to next_state on each clock edge. (2) The next-state logic – a combinational always block that determines the next state based on current state and inputs. (3) The output logic – a combinational always block that generates outputs based on the current state (Moore) or current state and inputs (Mealy). This style is the industry best practice because it is the most readable, most maintainable, and most synthesis-friendly way to code FSMs.

    18. Conclusion

    RTL design is the bedrock of every digital chip manufactured in the world. It is the stage where hardware engineers translate architecture specifications into concrete, synthesisable hardware descriptions using Verilog, SystemVerilog, or VHDL. The quality of the RTL code written at this stage directly determines the success of everything that follows in the VLSI design flow – from how quickly verification achieves coverage closure, to how efficiently logic synthesis produces a quality netlist, to whether the physical design team achieves timing closure without costly design changes.

    In this comprehensive guide, we covered all aspects of professional RTL design: the definition and abstraction levels, Verilog and VHDL language fundamentals, synthesisable vs non-synthesisable constructs, sequential and combinational logic coding, FSM coding best practices, reset strategies, CDC handling, parameterisation, coding guidelines, lint checking, and the most common RTL mistakes that cost engineers time and companies money.

    The key message is this: RTL design is hardware description. Every line of code you write becomes physical silicon. Write it with the care and precision that hardware demands – not the casualness that software sometimes permits. Clean, lint-free, well-structured, synthesisable RTL code is the foundation of every successful chip.


    Discover more from PiEmbSysTech - Embedded Systems & VLSI Lab

    Subscribe to get the latest posts sent to your email.

    Leave a ReplyCancel reply

    Discover more from PiEmbSysTech - Embedded Systems & VLSI Lab

    Subscribe now to keep reading and get access to the full archive.

    Continue reading

    Exit mobile version