贝叶斯优化驱动的粗粒度可重构密码逻辑阵列设计空间探索方法

蒋丹萍; 戴紫彬; 刘燕江; 周朝旭; 宋晓玉

doi:10.11999/JEIT250624

贝叶斯优化驱动的粗粒度可重构密码逻辑阵列设计空间探索方法

doi: 10.11999/JEIT250624 cstr: 32379.14.JEIT250624

网络空间部队信息工程大学郑州 450000

基金项目: 国家自然科学基金(62302519)

详细信息

作者简介:
蒋丹萍：女，博士生，研究方向为安全专用芯片设计、可重构计算

戴紫彬：男，博士、教授、博士生导师，研究方向为信息安全、体系结构等

刘燕江：男，博士、讲师，研究方向为安全专用芯片设计、侧信道攻击等

周朝旭：男，博士生，研究方向为安全专用芯片设计、可重构计算

宋晓玉：女，博士生，研究方向为安全专用芯片设计、可重构计算

通讯作者:
戴紫彬　daizb2004@126.com

中图分类号: TN492; TP309.7
计量
- 文章访问数: 444
- HTML全文浏览量: 266
- PDF下载量: 19
- 被引次数: 0
出版历程
- 收稿日期: 2025-07-03
- 修回日期: 2025-10-20
- 网络出版日期: 2025-10-24
- 刊出日期: 2025-11-10

Bayesian Optimization-driven Design Space Exploration Method for Coarse-Grained Reconfigurable Cipher Logic Array

Information Engineering University, Zhengzhou 450000, China

Funds: The National Natural Science Foundation of China (62302519)

摘要

摘要: 由于粗粒度可重构密码逻辑阵列(CGRCA)的设计空间规模巨大，导致设计评估耗时长，手工探索优化解的质量不高且搜索效率较低。为此，该文面向CGRCA架构的高维空间、多目标优化特性，提出了基于贝叶斯优化的多目标设计空间探索方法，在平衡吞吐量、面积和FU利用率的同时提升解的质量。首先，该方法利用知识感知的无监督学习采样策略获得初始样本，确保初始样本的代表性与多样性。其次，建立快速评估模型对样本进行量化评估，缩短评估性能的时长。再者，设计自适应的多采集函数并建立基于贪心的混合代理模型，提出多目标贝叶斯优化方法来搜索最优的CGRCA架构，提升搜索效率和通用性。实验结果表明，该文提出的设计空间探索方法较其他设计空间探索方法，与参考集的平均距离(ADRS)至多降低34.9%，超体积提升28.7%，吞吐量提升29.9%，面积减少6.0%，FU利用率提升11.6%，并且展现出优异的跨算法稳定性。
- 粗粒度可重构密码逻辑阵列 /
- 设计空间探索 /
- 贝叶斯优化 /
- 随机森林 /
- 神经网络
Abstract: Objective Coarse-Grained Reconfigurable Cipher logic Arrays (CGRCAs) are widely employed in information security systems owing to their high flexibility, strong performance, and inherent security. Design Space Exploration (DSE) plays a critical role in evaluating and optimizing the performance of cryptographic algorithms deployed on CGRCAs. However, conventional DSE approaches require extensive computation time to locate optimal solutions in multi-objective optimization problems and often yield suboptimal performance. To overcome these limitations, this study proposes a Bayesian optimization-based DSE framework, termed Multi-Objective Bayesian optimization-based Exploration (MOBE), which enhances search efficiency and solution quality while effectively satisfying the complex design requirements of CGRCA architectures. Methods The high-dimensional characteristics and multi-objective optimization features of the CGRCA are analyzed, and its design space is systematically modeled. A DSE method based on Bayesian optimization is then proposed, comprising initial sampling design, rapid evaluation model construction, surrogate model development, and acquisition function optimization. A knowledge-aware unsupervised learning sampling strategy is introduced to integrate domain-specific knowledge with clustering algorithms, thereby improving the representativeness and diversity of the initial samples. A rapid evaluation model is established to estimate throughput, area overhead, and Function Unit (FU) utilization for each sample, effectively reducing the computational cost of performance evaluation. To enhance both search efficiency and generalizability, a greedy-based hybrid surrogate model is constructed by combining Gaussian Process with Deep Kernel Learning (DKL-GP), random forest, and neural network models. Moreover, an adaptive multi-acquisition function is designed by integrating Expected Hyper Volume Improvement (EHVI) and quasi-Monte Carlo Upper Confidence Bound (qUCB) to identify the most promising samples and maintain a balanced trade-off between exploration and exploitation. The weighting ratio between EHVI and qUCB is dynamically adjusted to accommodate the varying optimization requirements across different search phases. Results and Discussions The DSE method based on Bayesian optimization (Algorithm 2) includes initial sampling design, rapid evaluation model construction, surrogate model development, and acquisition function optimization to enhance solution quality and search efficiency. Simulation results show that the knowledge-aware unsupervised learning sampling strategy reduces the Average Distance from Reference Set (ADRS) by up to 28.2% and increases hypervolume by 15.1% compared with existing sampling approaches (Table 3). This improvement primarily arises from the integration of domain knowledge with clustering algorithms. Compared with single surrogate model-based DSE methods, the greedy-based hybrid surrogate model leverages the complementary advantages of multiple surrogate models across different optimization stages, prioritizing samples that contribute most to hypervolume expansion. The hybrid surrogate model achieves a reduction in ADRS of up to 31.7% and an improvement in hypervolume of 20.0% (Table 4). Furthermore, the proposed MOBE framework achieves a 34.9% reduction in ADRS and increases hypervolume by 28.7% relative to state-of-the-art DSE methods (Table 5). Regarding the average performance metrics of Pareto-front samples, MOBE enhances throughput by up to 29.9%, reduces area overhead by 6.0%, and improves FU utilization by 11.6% (Fig. 6), confirming its superiority in overall solution quality. Moreover, the MOBE method exhibits excellent cross-algorithm stability in both hypervolume and Normalized Overall Execution Time (NOET) (Table 6 and Fig. 7). Conclusions This study presents a multi-objective DSE method based on Bayesian optimization that enhances both solution quality and search efficiency for CGRCA. The proposed approach employs a knowledge-aware unsupervised learning sampling strategy to generate an initial sample set with high representativeness and diversity. A rapid evaluation model is subsequently developed to reduce the computational cost of performance assessments. Additionally, the integration of adaptive multi-acquisition functions with a greedy-based hybrid surrogate model further improves the efficiency and generalization capability of the DSE framework. Comparative experiments demonstrate the effectiveness of the proposed MOBE method: (1) the sampling strategy reduces the ADRS by up to 28.2% and increases hypervolume by 15.1% compared with existing methods; (2) the greedy-based hybrid surrogate model achieves up to a 31.7% reduction in ADRS and a 20.0% improvement in hypervolume relative to single surrogate model-based approaches; (3) the overall MOBE framework achieves a 34.9% reduction in ADRS and a 28.7% increase in hypervolume compared with state-of-the-art DSE techniques; (4) MOBE improves throughput by up to 29.9%, reduces area overhead by 6.0%, and increases FU utilization by 11.6% relative to existing methods; and (5) MOBE exhibits excellent cross-algorithm stability in hypervolume and NOET. MOBE is applicable to medium-and-high-performance cryptographic application scenarios, including cloud platforms and desktop terminals. Nevertheless, two limitations remain. First, MOBE currently employs only traditional surrogate models, which may constrain feature learning efficiency and modeling accuracy. Second, its validation is confined to a CGRCA architecture previously developed by the research group, lacking verification across existing CGRCA architectures. Future work will address these limitations by incorporating emerging artificial intelligence techniques, such as large models, and conducting extensive experiments on diverse CGRCA architectures to further enhance the generalization and effectiveness of MOBE.
- Coarse-Grained Reconfigurable Cipher logic Array (CGRCA) /
- Design Space Exploration (DSE) /
- Bayesian optimization /
- Random forest /
- Neural network

HTML全文

图 1 高性能CGRCA组成结构

下载: 全尺寸图片幻灯片

图 2 帕累托前沿示意图

下载: 全尺寸图片幻灯片

图 3 MOBE概述

下载: 全尺寸图片幻灯片

图 4 快速评估-DC一致性曲线

下载: 全尺寸图片幻灯片

图 5 迭代轮数预实验

下载: 全尺寸图片幻灯片

图 6 不同DSE方法性能指标比较

下载: 全尺寸图片幻灯片

图 7 不同DSE方法在多种密码算法下的指标比较

下载: 全尺寸图片幻灯片

表 1 CGRCA设计参数

参数	符号	层次	取值
可重构处理级数量	r	CGRA	1～32
可重构处理级内PE数量	c	CGRA	4～8
PE内逻辑单元数量	FU1	处理单元	1～4
PE内模加单元数量	FU2	处理单元	1～4
PE内模乘单元数量	FU3	处理单元	1～4
PE内移位单元数量	FU4	处理单元	1～4
PE内置换单元数量	FU5	处理单元	1～4
PE内有限域乘法单元数量	FU6	处理单元	1～4
前向跨级互连网络位宽	K1	全局互连	1～4
后向反馈互连网络位宽	K2	全局互连	1～4
前向跨级互连网络跨级长度	P1	全局互连	4～32
后向反馈互连网络跨级长度	P2	全局互连	4～32
存储器数量	MN	存储器	4～16

下载: 导出CSV

1 知识感知的无监督学习采样策略的算法描述

输入：设计空间 D；初始样本数量N
输出：初始样本集 X
(1) X ← $\varnothing $;
(2) T ← Halton(D, N);//构建候选样本集
(3) l, LR ← Hierarchical_Cluster(T, l_max, weight);//计算子层数量l、子层中可重构处理级取值范围集LR
(4) LSN ← NPS(l, LR);//计算所有子层样本簇的集LSN
(5) for i ← 1 to l do
(6) 　LS_i ← LS_i$ \cup $Halton(lsn_i, lr_i, Len(lsn_i));//计算子层i候选样本集LS_i
(7) 　p_i, C_i ← EC_Kmeans(LS_i, cn_max);//计算子层i中簇的数量p_i、子层i中所有簇的集合C_i
(8) 　for j ← 1 to p_i do
(9) 　　$x_{ij}^*$← Centroid(c_ij); //选择聚类的质心作为候选样本
(10) end for
(11) while not converged do
(12) 　for j ← 1 to p_i do
(13) 　　for all x $ \in $c_ij do
(14) 　　　R(x) ← $\dfrac{1}{{\|{c_{ij}}\| - 1}} \times \displaystyle\sum\nolimits_{{{\boldsymbol{x}}^{\prime}} \in {c_{ij}}} {\|\|{\boldsymbol{x}} - {{\boldsymbol{x}}^{\prime}}\|\|} $;//评估代表性
(15) 　　　D(x) ← $ \mathop {{\text{min}}}\nolimits_{{{\boldsymbol{x}}^} \in \{ {\boldsymbol{x}}_{in}^\} _{n = 1}^{{p_i}}\backslash \{ x_{ij}^\} } \|\|{\boldsymbol{x}} - {{\boldsymbol{x}}^}\|\| $;//评估多样性
(16) 　　end for
(17) 　　x_ij ← $ \arg \;{\max _{{\boldsymbol{x}} \in {c_{ij}}}}[{\boldsymbol{D}}({\boldsymbol{x}}) - {\boldsymbol{R}}({\boldsymbol{x}})] $;
(18) 　　$\{ {\boldsymbol{x}}_{{\mathrm{in}}}^\} _{n = 1}^{{p_i}}$←$\{ {\boldsymbol{x}}_{{\mathrm{in}}}^\} _{n = 1}^{{p_i}}\; \cup \;\{ {x_{ij}}\} \backslash \{ x_{ij}^*\} $;
(19) 　end for
(20) end while
(21) return X =$\{ \{ x_{mn}^*\} _{n = 1}^{{p_i}}\} _{m = 1}^l$

下载: 导出CSV

2 MOBE算法描述

输入：设计空间D；初始样本数量N；迭代次数 M
输出：帕累托最优集P；最优解P^*
(1) X ← Ini_Sampling(D, N);//初始采样
(2) Y ← Evaluation(X);//评估性能
(3) D ← D \ X;
(4) Q ← (X, Y);
(5) Initialize surrogate models;
(6) HV ←$\varnothing $;//初始化超体积
(7) for i ← 1 to M do
(8) 　C ← Halton(D, m);//均匀随机采样m个样本作为候选样　　　本集
(9) 　x1_i ← arg max(MAcq(C, M₁)); //选择DKL-GP模型对应　　　采集函数值最大的样本
(10) x2_i ← arg max(MAcq(C, M₂));//选择随机森林模型对应　　　采集函数值最大的样本
(11) x3_i ← arg max(MAcq(C, M₃)); //选择神经网络模型对应　　　采集函数值最大的样本
(12) $x_i^*$← arg max(MAcq(x1_i, x2_i, x3_i);//选择本轮迭代最优样　　　本
(13) $y_i^$← Evaluation($x_i^$);//评估性能
(14) Q ← Q$ \cup ${$x_i^,y_i^$};
(15) D ← D \$x_i^*$;
(16) HV ← HV$ \cup $Cal_HV(Q);//更新超体积
(17) end for
(18) P ← Pareto(Q);//计算帕累托最优集
(19) P^* ← Max_TH(P);//选择吞吐量最大的作为最优解
(20) return Pareto-optimal set P and optimal solution P^*

下载: 导出CSV

表 2 设计参数设置

设计参数编号	r	c	FU1	FU2	FU3	FU4	FU5	FU6	K1	K2	P1	P2	MN
1	2	4	1	2	1	3	1	1	1	4	5	18	16
2	5	5	2	1	1	1	4	1	2	2	8	4	8
3	8	8	1	4	2	3	1	2	2	1	13	24	4
4	10	8	1	3	1	2	1	4	4	2	32	14	9
5	15	6	4	1	3	2	1	2	4	4	2	30	12
6	18	4	1	1	4	1	2	1	1	2	15	16	7
7	20	8	2	2	1	2	1	1	2	1	18	9	15
8	24	7	1	1	1	1	4	3	3	1	28	25	5
9	28	4	3	2	1	3	1	2	2	3	12	6	10
10	32	4	2	1	1	2	2	2	2	2	16	16	4

下载: 导出CSV

表 3 不同采样策略的实验结果

采样算法	ADRS	超体积	NOET
MOBE-RS	0.039	0.557	1.000
MOBE-MS	0.035	0.573	0.984
MOBE-US	0.032	0.595	0.980
MOBE	0.028	0.641	0.934

下载: 导出CSV

表 4 不同代理模型的实验结果

代理模型	ADRS	超体积	NOET
MOBE-RF	0.041	0.577	0.443
MOBE-GP	0.034	0.534	0.788
MOBE-NN	0.031	0.541	0.326
MOBE	0.028	0.641	0.934

下载: 导出CSV

表 5 MOBE, BOOM-Explorer和AUGER的实验结果

DSE方法	ADRS	超体积	NOET
BOOM-Explorer	0.043	0.498	0.708
AUGER	0.038	0.538	0.927
MOBE	0.028	0.641	0.934

下载: 导出CSV

表 6 不同DSE方法的多个指标的CV比较(%)

DSE方法	超体积CV	ADRS CV	NOET CV
MOBE	10.29	22.56	6.74
AUGER	12.68	19.33	7.27
BOOM-Explorer	12.38	16.25	7.14
MOBE-NN	17.16	22.64	4.21
MOBE-GP	16.31	21.57	5.31
MOBE-RF	12.03	33.03	16.33
MOBE-US	14.90	24.97	8.25
MOBE-MS	14.28	24.46	8.38
MOBE-RS	18.36	19.39	7.88

下载: 导出CSV

参考文献(17)

[1]	DESHWAL A, JAYAKODI N K, JOARDAR B K, et al. MOOS: A multi-objective design space exploration and optimization framework for NoC enabled manycore systems[J]. ACM Transactions on Embedded Computing Systems (TECS), 2019, 18(5s): 77. doi: 10.1145/3358206.
[2]	KIRKPATRICK S, GELATT JR C D, and VECCHI M P. Optimization by simulated annealing[J]. Science, 1983, 220(4598): 671–680. doi: 10.1126/science.220.4598.671.
[3]	DEB K, PRATAP A, AGARWAL S, et al. A fast and elitist multiobjective genetic algorithm: NSGA-II[J]. IEEE Transactions on Evolutionary Computation, 2002, 6(2): 182–197. doi: 10.1109/4235.996017.
[4]	ZHANG Qingfu and LI Hui. MOEA/D: A multiobjective evolutionary algorithm based on decomposition[J]. IEEE Transactions on Evolutionary Computation, 2007, 11(6): 712–731. doi: 10.1109/TEVC.2007.892759.
[5]	WENG Jian, LIU Sihao, DADU V, et al. DSAGEN: Synthesizing programmable spatial accelerators[C]. 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture, Valencia, Spain, 2020: 268–281. doi: 10.1109/ISCA45697.2020.00032.
[6]	TAN Cheng, XIE Chenhao, LI Ang, et al. AURORA: Automated refinement of coarse-grained reconfigurable accelerators[C]. 2021 Design, Automation & Test in Europe Conference & Exhibition, Grenoble, France, 2021: 1388–1393. doi: 10.23919/DATE51398.2021.9473955.
[7]	BANDARA T K, WIJERATHNE D, MITRA T, et al. REVAMP: A systematic framework for heterogeneous CGRA realization[C]. The 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 2022: 918–932. doi: 10.1145/3503222.3507772.
[8]	JOARDAR B K, KIM R G, DOPPA J R, et al. Learning-based application-agnostic 3D NoC design for heterogeneous manycore systems[J]. IEEE Transactions on Computers, 2019, 68(6): 852–866. doi: 10.1109/TC.2018.2889053.
[9]	QI Sirui, LI Yingheng, PASRICHA S, et al. MOELA: A multi-objective evolutionary/learning design space exploration framework for 3D heterogeneous manycore platforms[C]. 2023 Design, Automation & Test in Europe Conference & Exhibition, Antwerp, Belgium, 2023: 1–6. doi: 10.23919/DATE56975.2023.10137276.
[10]	KIM R G, DOPPA J R, and PANDE P P. Machine learning for design space exploration and optimization of manycore systems[C]. 2018 IEEE/ACM International Conference on Computer-Aided Design, San Diego, USA, 2018: 1–6. doi: 10.1145/3240765.3243483.
[11]	LOPES A S B and PEREIRA M M. A machine learning approach to accelerating DSE of reconfigurable accelerator systems[C]. 2020 33rd Symposium on Integrated Circuits and Systems Design, Campinas, Brazil, 2020: 1–6. doi: 10.1109/SBCCI50935.2020.9189899.
[12]	LI Jingyuan, QIU Yunhui, ZHU Guowei, et al. THRAM: A template-based heterogeneous CGRA modeling framework supporting fast DSE[C]. 2023 IEEE International Symposium on Circuits and Systems, Monterey, USA, 2023: 1–5. doi: 10.1109/ISCAS46773.2023.10182204.
[13]	PENG Bingbing, SUN Shaoyang, DAI Yuan, et al. PRAD: A Bayesian optimization-based DSE framework for parameterized reconfigurable architecture design[C]. 2023 IEEE 31st Annual International Symposium on Field-Programmable Custom Computing Machines, Marina Del Rey, USA, 2023: 226–226. doi: 10.1109/FCCM57271.2023.00054.
[14]	KUANG Huizhen, ZHENG Su, and WANG Lingli. Automated design space exploration of coarse-grained reconfigurable architecture via Bayesian optimization[C]. 2022 IEEE 16th International Conference on Solid-State & Integrated Circuit Technology, Nangjing, China, 2022: 1–3. doi: 10.1109/ICSICT55466.2022.9963336.
[15]	DAI Yuan, LI Jingyuan, ZHU Qilong, et al. HETA: A heterogeneous temporal CGRA modeling and design space exploration via Bayesian optimization[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2024, 32(3): 505–518. doi: 10.1109/TVLSI.2023.3344536.
[16]	BAI Chen, SUN Qi, ZHAI Jianwang, et al. BOOM-Explorer: RISC-V BOOM microarchitecture design space exploration framework[C]. 2021 IEEE/ACM International Conference on Computer Aided Design, Munich, Germany, 2021: 1–9. doi: 10.1109/ICCAD51958.2021.9643455.
[17]	LI Jingyuan, HU Yihan, DAI Yuan, et al. AUGER: A multi-objective design space exploration framework for CGRAs[C]. 2023 International Conference on Field Programmable Technology, Yokohama, Japan, 2023: 88–95. doi: 10.1109/ICFPT59805.2023.00015.