Citation: | CHEN Xiyuan, JIANG Yuxuan, XIA Yingjie, HU Ji, ZHOU Yizhao. VCodePPA: A Large-Scale Verilog Dataset with PPA Annotations[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250449 |
[1] |
THAKUR S, AHMAD B, FAN Zhenxing, et al. Benchmarking large language models for automated Verilog RTL code generation[C]. 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 2023: 1–6. doi: 10.23919/DATE56975.2023.10137086.
|
[2] |
LU Yao, LIU Shang, ZHANG Qijun, et al. RTLLM: An open-source benchmark for design RTL generation with large language model[C]. 2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC), Incheon, Republic of Korea, 2024: 722–727. doi: 10.1109/ASP-DAC58780.2024.10473904.
|
[3] |
LIU Mingjie, PINCKNEY N, KHAILANY B, et al. Invited paper: VerilogEval: Evaluating large language models for Verilog code generation[C]. 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Francisco, USA, 2023: 1–8. doi: 10.1109/ICCAD57390.2023.10323812.
|
[4] |
LIU Mingjie, TSAI Y D, ZHOU Wenfei, et al. CraftRTL: High-quality synthetic data generation for Verilog code models with correct-by-construction non-textual representations and targeted code repair[C]. The Thirteenth International Conference on Learning Representations, Singapore, Singapore, 2025.
|
[5] |
ZHAO Yang, HUANG Di, LI Chongxiao, et al. CodeV: Empowering LLMs for Verilog generation through multi-level summarization[EB/OL]. https://arxiv.org/abs/2407.10424v4, 2024.
|
[6] |
THAKUR S, AHMAD B, PEARCE H, et al. VeriGen: A large language model for Verilog code generation[J]. ACM Transactions on Design Automation of Electronic Systems, 2024, 29(3): 46. doi: 10.1145/3643681.
|
[7] |
LIU Shang, FANG Wenji, LU Yao, et al. RTLCoder: Fully open-source and efficient LLM-assisted RTL code generation technique[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2025, 44(4): 1448–1461. doi: 10.1109/TCAD.2024.3483089.
|
[8] |
NADIMI B and ZHENG H. A multi-expert large language model architecture for Verilog code generation[C]. 2024 IEEE LLM Aided Design Workshop (LAD), San Jose, USA, 2024: 1–5. doi: 10.1109/LAD62341.2024.10691683.
|
[9] |
WU Peiyang, GUO Nan, LV Junliang, et al. RTLRepoCoder: Repository-level RTL code completion through the combination of fine-tuning and retrieval augmentation[EB/OL]. https://arxiv.org/abs/2504.08862, 2025.
|
[10] |
THORAT K, ZHAO Jiahui, LIU Yaotian, et al. Advanced Large Language Model (LLM)-driven Verilog development: Enhancing power, performance, and area optimization in code synthesis[EB/OL]. https://arxiv.org/abs/2312.01022, 2023.
|
[11] |
PEI Zehua, ZHEN Huiling, YUAN Mingxuan, et al. BetterV: Controlled Verilog generation with discriminative guidance[C]. Forty-First International Conference on Machine Learning, Vienna, Austria, 2024.
|
[12] |
TSAI Y, LIU Mingjie, and REN Haoxing. RTLFixer: Automatically fixing RTL syntax errors with large language model[C]. Proceedings of the 61st ACM/IEEE Design Automation Conference, San Francisco, USA, 2024: 53. doi: 10.1145/3649329.3657353.
|
[13] |
PULAVARTHI V, NANDAL D, DAN S, et al. AssertionBench: A benchmark to evaluate large-language models for assertion generation[C]. Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, New Mexico, 2025: 8058–8065. doi: 10.18653/v1/2025.findings-naacl.449.
|
[14] |
QIU Ruidi, ZHANG G L, DRECHSLER R, et al. AutoBench: Automatic testbench generation and evaluation using LLMs for HDL design[C]. Proceedings of the 2024 ACM/IEEE 6th International Symposium on Machine Learning for CAD, Salt Lake City, USA, 2024: 1–10. doi: 10.1109/MLCAD62225.2024.10740250.
|
[15] |
QIU Ruidi, ZHANG G L, DRECHSLER R, et al. CorrectBench: Automatic testbench generation with functional self-correction using LLMs for HDL design[C]. 2025 Design, Automation & Test in Europe Conference (DATE), Lyon, France, 2025: 1–7. doi: 10.23919/DATE64628.2025.10992873.
|
[16] |
NADIMI B, BOUTAIB G O, and ZHENG Hao. VeriMind: Agentic LLM for automated Verilog generation with a novel evaluation metric[EB/OL]. https://arxiv.org/abs/2503.16514, 2025.
|
[17] |
ABDELATTY M, MA Jingxiao, and REDA S. MetRex: A benchmark for Verilog code metric reasoning using LLMs[C]. Proceedings of the 30th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 2025: 995–1001. doi: 10.1145/3658617.3697625.
|
[18] |
ZHANG Yongan, YU Zhongzhi, FU Yonggan, et al. MG-Verilog: Multi-grained dataset towards enhanced LLM-assisted Verilog generation[C]. 2024 IEEE LLM Aided Design Workshop (LAD), San Jose, USA, 2024: 1–5. doi: 10.1109/LAD62341.2024.10691738.
|
[19] |
WEI Anjiang, TAN Huanmi, SURESH T, et al. VeriCoder: Enhancing LLM-based RTL code generation through functional correctness validation[EB/OL]. https://arxiv.org/abs/2504.15659, 2025.
|
[20] |
ALLAM A and SHALAN M. RTL-repo: A benchmark for evaluating LLMs on large-scale RTL design projects[C]. 2024 IEEE LLM Aided Design Workshop (LAD), San Jose, USA, 2024: 1–5. doi: 10.1109/LAD62341.2024.10691810.
|
[21] |
LIU Shang, LU Yao, FANG Wenji, et al. OpenLLM-RTL: Open dataset and benchmark for LLM-aided design RTL generation[C]. Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design, New York, USA, 2024: 60. doi: 10.1145/3676536.3697118.
|
[22] |
BUSH S, DELORENZO M, TIEU P, et al. Free and fair hardware: A pathway to copyright infringement-free Verilog generation using LLMs[C]. 62nd ACM/IEEE Design Automation Conference (DAC), San Francisco, USA, 2025: 1–7. doi: 10.1109/DAC63849.2025.11132658.
|
[23] |
WENG Lilian. Contrastive representation learning. Lil’Log[EB/OL]. https://lilianweng.github.io/posts/2021-05-31-contrastive/, 2021.
|
[24] |
GAO Tianyu, YAO Xingcheng, and CHEN Danqi. SimCSE: Simple contrastive learning of sentence embeddings[C]. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 2021: 6894–6910. doi: 10.18653/v1/2021.emnlp-main.552.
|
[25] |
NEELAKANTAN A, XU Tao, PURI R, et al. Text and code embeddings by contrastive pre-training[EB/OL]. https://arxiv.org/abs/2201.10005, 2022.
|