VCodePPA: A Large-scale Verilog Dataset with PPA Annotations

CHEN Xiyuan; JIANG Yuxuan; XIA Yingjie; HU Ji; ZHOU Yizhao

doi:10.11999/JEIT250449

Volume 47 Issue 11

Nov. 2025

Turn off MathJax

Article Contents

Article Navigation > Journal of Electronics & Information Technology > 2025 > 47(11): 4606-4619

CHEN Xiyuan, JIANG Yuxuan, XIA Yingjie, HU Ji, ZHOU Yizhao. VCodePPA: A Large-scale Verilog Dataset with PPA Annotations[J]. Journal of Electronics & Information Technology, 2025, 47(11): 4606-4619. doi: 10.11999/JEIT250449

Citation:

CHEN Xiyuan, JIANG Yuxuan, XIA Yingjie, HU Ji, ZHOU Yizhao. VCodePPA: A Large-scale Verilog Dataset with PPA Annotations[J]. Journal of Electronics & Information Technology, 2025, 47(11): 4606-4619. doi: 10.11999/JEIT250449

Citation:

PDF( 4130 KB)

VCodePPA: A Large-scale Verilog Dataset with PPA Annotations

doi: 10.11999/JEIT250449 cstr: 32379.14.JEIT250449

1.
School of Computer, Hangzhou Dianzi University, Hangzhou 310018, China
2.
School of Electronic and Information Enineering, Hangzhou Dianzi University, Hangzhou 310018, China
3.
Institute of Microelectronics, Hangzhou Dianzi University, Hangzhou 310018, China

Funds: The National Natural Science Foundation of China (62472132), Zhejiang Province Key Research and Development Program (2025C01063, 2024C01179, 2024C01232)

Received Date: 2025-05-21
Rev Recd Date: 2025-10-13

Available Online: 2025-10-20

Publish Date: 2025-11-10

Abstract

Abstract

Objective As a predominant hardware description language, the quality of Verilog code directly affects the Power, Performance, and Area (PPA) metrics of the resulting circuits. Current Large Language Model (LLM)-based approaches for generating hardware description languages face a central challenge: incorporating a design feedback mechanism informed by PPA metrics to guide model optimization, rather than relying solely on syntactic and functional correctness. The field faces three major limitations: the absence of PPA metric annotations in training data, which prevents models from learning the effects of code modifications on physical characteristics; evaluation frameworks that remain disconnected from downstream engineering needs; and the lack of systematic data augmentation methods to generate functionally equivalent code with differentiated PPA characteristics. To address these gaps, we present VCodePPA, a large-scale dataset that establishes precise correlations between Verilog code structures and PPA metrics. The dataset comprises 17 342 entries and provides a foundation for data-driven optimization paradigms in hardware design. Methods The dataset construction is initiated by collecting representative Verilog code samples from GitHub repositories, OpenCores projects, and standard textbooks. After careful selection, a seed dataset of 3 500 samples covering 20 functional categories is established. These samples are preprocessed through functional coverage optimization, syntax verification with Yosys, format standardization, deduplication, and complexity filtering. An automated PPA extraction pipeline is implemented in Vivado to evaluate performance characteristics, with metrics including LookUp Table (LUT) count, register usage, maximum operating frequency, and power consumption. To enhance dataset diversity while preserving functional equivalence, a multi-dimensional code transformation framework is applied, consisting of nine methods across three dimensions: architecture layer (finite state machine encoding, interface protocol reconstruction, arithmetic unit replacement), logic layer (control flow reorganization, operator rewriting, logic hierarchy restructuring), and timing layer (critical path cutting, register retiming, pipeline insertion or deletion). Efficient exploration of the transformation space is achieved through a Homogeneous Verilog Mutation Search (HVMS) algorithm based on Monte Carlo Tree Search, which generates 5～10 PPA-differentiated variants for each seed code. A dual-task LLM training strategy with PPA-guided adaptive loss functions is subsequently employed, incorporating contrastive learning mechanisms to capture the relationship between code structure and physical implementation. Results and Discussions The VCodePPA dataset achieves broad coverage of digital hardware design scenarios, representing approximately 85%～90% of common design contexts. The multi-dimensional transformation framework generates functionally equivalent yet structurally diverse code variants, with PPA differences exceeding 20%, thereby exposing optimization trade-offs inherent in hardware design. Experimental evaluation demonstrates that models trained with VCodePPA show marked improvements in PPA optimization across multiple Verilog functional categories, including arithmetic, memory, control, and hybrid modules. In testing scenarios, VCodePPA-trained models produced implementations with superior PPA metrics compared with baseline models. The PPA-oriented adaptive loss function effectively overcame the traditional limitation of language model training, which typically lacks sensitivity to hardware implementation efficiency. By integrating contrastive learning and variant comparison loss mechanisms, the model achieved an average improvement of 17.7% across PPA metrics on the test set, influencing 32.4% of token-level predictions in code generation tasks. Notably, VCodePPA-trained models reduced on-chip resource usage by 10%$ \sim $15%, decreased power consumption by 8%$ \sim $12%, and shortened critical path delay by 5%$ \sim $8% relative to baseline models. Conclusions This paper introduces VCodePPA, a large-scale Verilog dataset with precise PPA annotations, addressing the gap between code generation and physical implementation optimization. The main contributions are as follows: (1)construction of a seed dataset spanning 20 functional categories with 3 500 samples, expanded through systematic multi-dimensional code transformation to 17 000 entries with comprehensive PPA metrics; (2)development of an MCTS-based homogeneous code augmentation scheme employing nine transformers across architectural, logical, and timing layers to generate functionally equivalent code variants with significant PPA differences; and (3)design of a dual-task training framework with PPA-oriented adaptive loss functions, enabling models to learn PPA trade-off principles directly from data rather than relying on manual heuristics or single-objective constraints. Experimental results demonstrate that models trained on VCodePPA effectively capture PPA balancing principles and generate optimized hardware description code. Future work will extend the dataset to more complex design scenarios and explore advanced optimization strategies for specialized application domains.

FullText(HTML)

References(25)

References

[1]	THAKUR S, AHMAD B, FAN Zhenxing, et al. Benchmarking large language models for automated Verilog RTL code generation[C]. 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 2023: 1–6. doi: 10.23919/DATE56975.2023.10137086.
[2]	LU Yao, LIU Shang, ZHANG Qijun, et al. RTLLM: An open-source benchmark for design RTL generation with large language model[C]. 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Republic of Korea, 2024: 722–727. doi: 10.1109/ASP-DAC58780.2024.10473904.
[3]	LIU Mingjie, PINCKNEY N, KHAILANY B, et al. Invited paper: VerilogEval: Evaluating large language models for Verilog code generation[C]. 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, USA, 2023: 1–8. doi: 10.1109/ICCAD57390.2023.10323812.
[4]	LIU Mingjie, TSAI Y D, ZHOU Wenfei, et al. CraftRTL: High-quality synthetic data generation for Verilog code models with correct-by-construction non-textual representations and targeted code repair[C]. The Thirteenth International Conference on Learning Representations, Singapore, Singapore, 2025.
[5]	ZHAO Yang, HUANG Di, LI Chongxiao, et al. CodeV: Empowering LLMs for Verilog generation through multi-level summarization[EB/OL]. https://arxiv.org/abs/2407.10424v4, 2024.
[6]	THAKUR S, AHMAD B, PEARCE H, et al. VeriGen: A large language model for Verilog code generation[J]. ACM Transactions on Design Automation of Electronic Systems, 2024, 29(3): 46. doi: 10.1145/3643681.
[7]	LIU Shang, FANG Wenji, LU Yao, et al. RTLCoder: Fully open-source and efficient LLM-assisted RTL code generation technique[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025, 44(4): 1448–1461. doi: 10.1109/TCAD.2024.3483089.
[8]	NADIMI B and ZHENG H. A multi-expert large language model architecture for Verilog code generation[C]. 2024 IEEE LLM Aided Design Workshop (LAD), San Jose, USA, 2024: 1–5. doi: 10.1109/LAD62341.2024.10691683.
[9]	WU Peiyang, GUO Nan, LV Junliang, et al. RTLRepoCoder: Repository-level RTL code completion through the combination of fine-tuning and retrieval augmentation[EB/OL]. https://arxiv.org/abs/2504.08862, 2025.
[10]	THORAT K, ZHAO Jiahui, LIU Yaotian, et al. Advanced Large Language Model (LLM)-driven Verilog development: Enhancing power, performance, and area optimization in code synthesis[EB/OL]. https://arxiv.org/abs/2312.01022, 2023.
[11]	PEI Zehua, ZHEN Huiling, YUAN Mingxuan, et al. BetterV: Controlled Verilog generation with discriminative guidance[C]. Forty-First International Conference on Machine Learning, Vienna, Austria, 2024.
[12]	TSAI Y, LIU Mingjie, and REN Haoxing. RTLFixer: Automatically fixing RTL syntax errors with large language model[C]. The 61st ACM/IEEE Design Automation Conference, San Francisco, USA, 2024: 53. doi: 10.1145/3649329.3657353.
[13]	PULAVARTHI V, NANDAL D, DAN S, et al. AssertionBench: A benchmark to evaluate large-language models for assertion generation[C]. Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, 2025: 8058–8065. doi: 10.18653/v1/2025.findings-naacl.449.
[14]	QIU Ruidi, ZHANG G L, DRECHSLER R, et al. AutoBench: Automatic testbench generation and evaluation using LLMs for HDL design[C]. The 2024 ACM/IEEE 6th International Symposium on Machine Learning for CAD, Salt Lake City, USA, 2024: 1–10. doi: 10.1109/MLCAD62225.2024.10740250.
[15]	QIU Ruidi, ZHANG G L, DRECHSLER R, et al. CorrectBench: Automatic testbench generation with functional self-correction using LLMs for HDL design[C]. 2025 Design, Automation & Test in Europe Conference (DATE), Lyon, France, 2025: 1–7. doi: 10.23919/DATE64628.2025.10992873.
[16]	NADIMI B, BOUTAIB G O, and ZHENG Hao. VeriMind: Agentic LLM for automated Verilog generation with a novel evaluation metric[EB/OL]. https://arxiv.org/abs/2503.16514, 2025.
[17]	ABDELATTY M, MA Jingxiao, and REDA S. MetRex: A benchmark for Verilog code metric reasoning using LLMs[C]. The 30th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 2025: 995–1001. doi: 10.1145/3658617.3697625.
[18]	ZHANG Yongan, YU Zhongzhi, FU Yonggan, et al. MG-Verilog: Multi-grained dataset towards enhanced LLM-assisted Verilog generation[C]. 2024 IEEE LLM Aided Design Workshop (LAD), San Jose, USA, 2024: 1–5. doi: 10.1109/LAD62341.2024.10691738.
[19]	WEI Anjiang, TAN Huanmi, SURESH T, et al. VeriCoder: Enhancing LLM-based RTL code generation through functional correctness validation[EB/OL]. https://arxiv.org/abs/2504.15659, 2025.
[20]	ALLAM A and SHALAN M. RTL-repo: A benchmark for evaluating LLMs on large-scale RTL design projects[C]. 2024 IEEE LLM Aided Design Workshop (LAD), San Jose, USA, 2024: 1–5. doi: 10.1109/LAD62341.2024.10691810.
[21]	LIU Shang, LU Yao, FANG Wenji, et al. OpenLLM-RTL: Open dataset and benchmark for LLM-aided design RTL generation[C]. The 43rd IEEE/ACM International Conference on Computer-Aided Design, New York, USA, 2024: 60. doi: 10.1145/3676536.3697118.
[22]	BUSH S, DELORENZO M, TIEU P, et al. Free and fair hardware: A pathway to copyright infringement-free Verilog generation using LLMs[C]. 62nd ACM/IEEE Design Automation Conference (DAC), San Francisco, USA, 2025: 1–7. doi: 10.1109/DAC63849.2025.11132658.
[23]	WENG Lilian. Contrastive representation learning. Lil’Log[EB/OL]. https://lilianweng.github.io/posts/2021-05-31-contrastive/, 2021.
[24]	GAO Tianyu, YAO Xingcheng, and CHEN Danqi. SimCSE: Simple contrastive learning of sentence embeddings[C]. The 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 2021: 6894–6910. doi: 10.18653/v1/2021.emnlp-main.552.
[25]	NEELAKANTAN A, XU Tao, PURI R, et al. Text and code embeddings by contrastive pre-training[EB/OL]. https://arxiv.org/abs/2201.10005, 2022.