Why Determinism still matters in the age of AI verification
Introduction
Formal verification is built on three pillars: assertions, constraints and formal tools. Assertions describe intended behavior and correctness criteria. Constraints prevent the design from going into illegal states. Formal engines evaluate those assertions deterministically under the given constraints, meaning a fixed input always leads to the same proof result.
Recent interest in large language models has popularized the idea of automating the creation of assertions from text descriptions, specifications, or waveform patterns. While this is an exciting direction, assertions generated by AI models are probabilistic, unstable across runs, and often syntactically or logically incorrect [1]. This instability introduces risk long before a proof ever begins. Once assertions are frozen, formal engines evaluate deterministically [2], but no solver can recover from an incorrect or incomplete property. Reliable verification, therefore, depends on deterministic assertion creation, not probabilistic text generation. AI may appear productive, yet even minor deviations in semantic meaning can invalidate verification intent. For sign-off critical logic, assertion determinism is non-negotiable.
The risk of AI-generated assertions
Large language models generate text through statistical token prediction, meaning two identical prompts may produce different assertions. In practice, the same instability appears when generating SystemVerilog assertions, coverage intents, or property patterns. Even when formatting appears correct, semantics can be subtly wrong. Typical issues include:
- Misinterpreted signal polarity
- Incorrect implications and sequence delays
- Silent omission of required edge conditions
- Assertions that pass but do not check the intended behavior
Studies show LLM-generated hardware verification artifacts exhibit semantic drift and instability across runs [1][3]. Two separate executions of the same prompt often produce structurally different assertions, making regression baselining unreliable.
Although instability is often discussed abstractly, Figure 1 demonstrates that LLM-based design and assertion flows have already been evaluated under formal verification conditions. ReFormAI[1] presents a pipeline that formally verifies LLM-generated RTL, tags it with Common Weakness Enumeration (CWE) vulnerability classifications, and labels it based on proof outcomes. The evaluation revealed high rates of verification rejects and counterexamples, confirming that probabilistic assertion creation introduces failure at generation time when assertions are produced without deterministic review and freezing.
Why deterministic assertion creation remains essential
Deterministic assertion creation whether manual or automated through a stable non-AI method, produces consistent and reproducible verification intent. When encountering nightly regressions, these properties must always exhibit the same behavior. This stability is critical for sign-off, ISO certified flows, reuse across IP boundaries, and certification evidence.
A stable assertion has three properties:
- It expresses design intent clearly and unambiguously
- It does not change across runs when the context does not change
- It captures edge cases instead of masking them with probability
Deterministic assertion practices establish a repeatable verification baseline that supports regression stability, reusable verification intent, and more efficient debugging when issues surface, as observed in assertion-based verification studies [4]. In contrast, AI-generated properties risk introducing false proofs, passing incorrect behavior, or masking corner-case violations that reappear in silicon.
At LUBIS EDA, emphasis is on assertion quality rather than volume. A single incorrect property can invalidate the reliability of an entire verification IP, regardless of how fast it was generated.
Where AI can safely contribute
AI still has value as long as it remains outside the proof-correctness boundary. Safe usage patterns observed across the industry include:
- Triage and clustering of failure traces [5]
- Automated grouping of similar counterexamples
- Log pattern extraction to accelerate debugging
- Suggestion of assertion skeletons for expert refinement
- Exploration of assertion alternatives under supervision
- Configuration hints for proof performance tuning
In these applications, AI serves as an assistant that accelerates human decision-making rather than replacing human judgment in verification tasks. The critical distinction is that AI suggestions undergo human verification before any action is taken, and engineers retain full authority over verification decisions. AI can help engineers work faster, not verify correctness. The verification result remains trustworthy only when the assertions themselves are deterministic, reviewable, and traceable back to specification intent.
Hybrid assertion workflows
Modern environments increasingly combine exploratory AI-assisted ideation with deterministic property creation. A correct workflow follows a funnel:
- AI proposes candidate intent patterns
- Engineers validate semantics and tighten boundaries
- Incorrect or ambiguous logic is discarded
- Final assertions are frozen under version control
- Formal engines prove against a stable property set
This hybrid flow retains productivity while protecting reliability. AI contributes to the space of possible assertion structures, but correctness is achieved only when final assertions are deterministic and frozen. Results reported in AssertionBench show that even state-of-the-art LLMs frequently output functionally incorrect assertions without expert refinement [3]. This AssertionBench evidence confirms that sign-off quality depends on deterministic assertion creation rather than probabilistic generation.
Figure 2 illustrates a typical LLM-assisted pipeline for assertion generation. Large language models are trained on example properties and then applied to new RTL to produce candidate assertions. These outputs often require syntax correction and semantic review before a deterministic formal verification engine can consume them. The formal engine still produces a stable PASS or FAIL result, but only after properties are frozen. This visualization supports separating AI exploration from deterministic sign-off, reinforcing that the primary source of risk lies in assertion generation rather than in proof evaluation.
The landscape has shifted. Several advances now come together. High level model analysis can extract states, transitions, invariants, and dataflow from executable models such as C++ and SystemC. Automated property generation can transform those models into more complete assertion suites that also capture timing behavior and correctness requirements. Formal verification engines have matured to handle deeper pipelines and larger state spaces. AI assistance can make it easier to create structured models that are analyzable, by translating intent into code that tools can reason about. Together, this makes assertion first design far more practical than it was for decades.
Where determinism delivers measurable value
Deterministic assertions enable verification repeatability, accelerate debug through traceable counterexamples, support ISO-compliant evidence generation for safety certification [6], and allow assertion IP to transfer cleanly across SoCs without revalidation overhead. Shared property baselines further enable multi-site consistency [7], and early detection of logic faults materially increases the probability of first-silicon success. Collectively, these benefits demonstrate that determinism is not a preference but an operational requirement for scaling design verification.
Figure 3 illustrates industry methodology adoption across three survey periods (2007, 2016, 2024) using Wilson Research Group data [7]. Assertion-based verification remains one of the most widely used techniques, alongside code coverage and constrained-random testing, and has shown sustained adoption even as automation evolves. This upward adoption trend supports the blog’s core argument: the value of assertions continues to increase over time, and therefore the determinism of those assertions remains essential for regression scalability, debug confidence, and sign-off accuracy.
Business impact
Regression predictability drives schedule trust. Managers can measure closure accurately only when nightly runs report against an unchanging assertion baseline. AI-generated assertions jeopardize this stability by altering intent unpredictably. Audit-ready verification data generated from deterministic flows shortens certification cycles, reduces respin risk, and lowers program costs [6][7]. In contrast, incorrect AI-generated assertions may mask a defect until silicon, resulting in high-cost recalls or functional failures that could have been prevented.
Outlook and future research
Future progress in AI-guided verification will depend on mechanisms that reduce variance and improve semantic reliability. One emerging direction is VERT [8], a correctness-driven learning model that introduces verification-aware scoring and stability thresholds during training. Unlike naive generative prompting, it explicitly prioritizes logical soundness over fluency alone. VERT does not eliminate risk but represents research movement toward safer AI-assisted assertion systems. Reproducibility still requires human validation and version freezing. The corresponding public dataset repository provides a reproducible environment for experimentation.
AI may one day provide usable property suggestions at higher confidence levels, but determinism in assertion finalization will remain non-negotiable for sign-off.
Risk Matrix for AI-Generated Assertions
AI-generated assertions introduce unique failure modes that do not exist in deterministic flows. The matrix below summarizes the risks, likelihoods, potential impacts, and required mitigations.
Summary Note:
The highest-severity risks arise not from solvers but from generation instability. Deterministic assertion creation is the required control mechanism for eliminating compounding AI verification errors.
Key Definitions
- Deterministic Assertion Creation: The process of creating assertions such that the same input always produces the same property, enabling reproducible verification.
- AI-Generated Assertions: Assertions produced by probabilistic large-language models, which may vary across runs and require human validation before sign-off.
- Semantic Drift: Variation in logical meaning produced by an AI model even when the input prompt remains unchanged.
- Regression Baseline / Nightly Run: Automated proof execution repeated regularly to detect design regressions; relies on stable assertions.
- Common Weakness Enumeration (CWE): A standardized security classification system for hardware and software vulnerabilities.
- Counterexample: A trace produced by a formal solver that demonstrates how a property was violated.
- AssertionBench: A benchmark suite used to evaluate the correctness and stability of LLM-generated assertions.
- ReFormAI: A dataset and formal verification pipeline containing LLM-generated RTL and labeled vulnerability classifications.
- VERT: A correctness-driven research model that introduces verification-aware scoring to reduce instability in AI-generated assertions.
- Version Freezing: Locking assertions to prevent mutation across regressions, enabling reproducible sign-off.
References
[1] D. N. Gadde, A. Kumar, T. Nalapat, E. Rezunov, and F. Cappellini, “All Artificial, Less Intelligence: GenAI through the Lens of Formal Verification,” Mar. 2024, [Online]. Available: http://arxiv.org/abs/2403.16750
[2] E. M. Clarke, T. A. Henzinger, H. Veith, and R. Bloem, Handbook of model checking. Springer International Publishing, 2018. doi: 10.1007/978-3-319-10575-8. [Online]. Available: http://staff.ustc.edu.cn/~huangwc/book/2018_Book_HandbookOfModelChecking.pdf
[3] V. Pulavarthi, D. Nandal, S. Dan, and D. Pal, “AssertionBench: A Benchmark to Evaluate Large-Language Models for Assertion Generation,” Feb. 2025, [Online]. Available: http://arxiv.org/abs/2406.18627
[4] H. Witharana, “A Survey on Assertion-based Hardware Verification,” 2022, doi: 10.1145/nnnnnnn. [Online]. Available: https://www.cise.ufl.edu/research/cad/Publications/csur22.pdf
[5] M. Abdollahi, S. F. Yeganli, M. ( Amir, ) Baharloo, and A. Baniasadi, “Hardware Design and Verification with Large Language Models: A Scoping Review, Challenges, and Open Issues,” 2024, doi: 10.3390/electronics. [Online]. Available: https://www.mdpi.com/2079-9292/14/1/120
[6] International Organization for Standardization, “ISO 26262-6:2018 Road vehicles – Functional safety – Part 6: Product development at the software level,” Geneva, 2018. [Online]. Available: https://cdn.standards.iteh.ai/samples/68388/34205953cd2c4c5f947890009caa464e/ISO-26262-6-2018.pdf
[7] H. D. Foster, “2024 Wilson Research Group IC/ASIC functional verification trend report,” 2024. [Online]. Available: https://verificationacademy.com/resource/a661652a-b077-4b68-98fd-b84750cd9c1c
[8] A. Brasoveanu, M. Moodie, and R. Agrawal, “Textual evidence for the perfunctoriness of independent medical reviews,” in CEUR Workshop Proceedings, CEUR-WS, 2020, pp. 1–9. [Online]. Available: https://arxiv.org/abs/2503.08923