高级搜索

留言板

尊敬的读者、作者、审稿人, 关于本刊的投稿、审稿、编辑和出版的任何问题, 您可以本页添加留言。我们将尽快给您答复。谢谢您的支持!

姓名
邮箱
手机号码
标题
留言内容
验证码

应用于边缘端视觉感知系统的低功耗片上缓冲存储器

陈漠 张静 王艳蓉 麦麦提·那扎买提 乔飞

陈漠, 张静, 王艳蓉, 麦麦提·那扎买提, 乔飞. 应用于边缘端视觉感知系统的低功耗片上缓冲存储器[J]. 电子与信息学报. doi: 10.11999/JEIT250466
引用本文: 陈漠, 张静, 王艳蓉, 麦麦提·那扎买提, 乔飞. 应用于边缘端视觉感知系统的低功耗片上缓冲存储器[J]. 电子与信息学报. doi: 10.11999/JEIT250466
CHEN Mo, ZHANG Jing, WANG Yanrong, NAZHAMAITI Maimaiti, QIAO Fei. Design of Low-Power On-Chip Cache for Visual Perception Systems on the Edge[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250466
Citation: CHEN Mo, ZHANG Jing, WANG Yanrong, NAZHAMAITI Maimaiti, QIAO Fei. Design of Low-Power On-Chip Cache for Visual Perception Systems on the Edge[J]. Journal of Electronics & Information Technology. doi: 10.11999/JEIT250466

应用于边缘端视觉感知系统的低功耗片上缓冲存储器

doi: 10.11999/JEIT250466 cstr: 32379.14.JEIT250466
基金项目: 北京市自然科学基金(L253009),新疆维吾尔自治区重点研发计划项(2022B01008-3),国家自然科学基金项目(92164203, 62334006)
详细信息
    作者简介:

    陈漠:男,硕士生,研究方向为低功耗智能视觉感知集成电路设计

    张静:女,教授,研究方向为智能感知器件

    王艳蓉:女,讲师,研究方向为气敏传感器

    麦麦提·那扎买提:男,博士,研究方向为超低功耗智能视觉感知芯片设计,面向视觉感知的能量采集和能量管理架构和电路设计

    乔飞:男,副研究员,研究方向为智能感知集成电路与系统

    通讯作者:

    乔飞 qiaofei@tsinghua.edu.cn

  • 中图分类号: TN492

Design of Low-Power On-Chip Cache for Visual Perception Systems on the Edge

Funds: The Beijing Natural Science Foundation (L253009), The Key Research and Development Program of Xinjiang Uygur Autonomous Region (2022B01008-3), The National Natural Science Foundation of China (92164203, 62334006)
  • 摘要: 视觉感知系统通过算法提取信息,但其能效受限于感知过程中数据转换与搬移的功耗开销。采用片上缓存实现边缘端系统中数据的存储与交互,通过减少对冗余信息的传输与搬运过程中的功耗,可有效降低系统的整体功耗。该研究提出了一种面向边缘计算的低功耗片上缓冲存储器(Cache)设计方案。该方案基于静态随机存取存储器(SRAM)单元,根据系统中二值神经网络层间数据量峰值,将缓存容量定制为40.5 Kb,集成在芯片内,专用于存储视觉感知系统的神经网络层间数据。针对传统单体式片上缓存功耗过高的问题,该方案采用分块式存储架构,依据二值神经网络最大池化层输出特征,将缓存阵列划分为8个独立可关断的72×72位子阵列。通过分块存储机制,数据存取过程中的动态功耗得到了有效降低。同时,该文进一步提出动态存储控制策略,利用卷积运算时层间数据量逐层递减的特性,在存取第2层卷积数据时,仅激活必要子阵列,由存储控制模块动态关闭未使用区块,实现功耗深度优化。在TSMC 180 nm CMOS工艺下仿真,结果表明时钟频率在10 MHz时,相较于单一式架构,分块式缓存在存储第1层卷积数据时,读写动态功耗降低64.97%;结合动态存储控制策略后,存储第2层卷积数据时的读写动态功耗进一步降低52.9%。该设计为边缘端视觉感知系统提供了高能效的片上存储解决方案。
  • 图  1  芯片上部署的四层神经网络的数据流示意图

    图  2  片上缓存存储器位于系统架构中的位置

    图  3  分块存储片上缓存架构

    图  4  缓冲存储器写入请求控制逻辑

    图  5  缓冲存储器读出请求控制逻辑

    图  6  72×72 SRAM子阵列架构

    图  7  72×72 SRAM子阵列版图

    图  8  不同工艺角和温度下写入与读取访问时间延迟

    表  1  设计总结

    工艺 180 nm, 1P6M CMOS
    电源电压 1.2 V
    存储容量 40.5k bit 72 bit × 72 bit × 8 bank
    单元类型 SRAM 6T
    单元版图大小 4.20 μm × 3.98 μm
    存储器版图大小 1.2 mm × 0.8 mm
    读写能耗 190.1 fJ/bit
    下载: 导出CSV

    表  2  不同时钟频率下读写操作动态功耗

    时钟频率(MHz) 温度( °C) 电源电压(V) 写操作动态功耗(μW) 读操作动态功耗(μW) 总动态功耗(μW)
    10 27 1.2 14.24 15.04 29.28
    20 35.75 34.65 70.40
    40 70.14 68.06 138.20
    下载: 导出CSV

    表  3  不同温度下读写操作动态功耗

    温度( °C) 时钟频率(MHz) 电源电压(V) 写操作动态功耗(μW) 读操作动态功耗(μW) 总动态功耗(μW)
    0 10 1.2 17.50 17.11 34.61
    27 14.24 15.04 29.28
    85 30.03 29.23 59.26
    下载: 导出CSV

    表  4  不同电源电压下读写操作动态功耗

    电源电压(V) 温度( °C) 时钟频率(MHz) 写操作动态功耗(μW) 读操作动态功耗(μW) 总动态功耗(μW)
    0.8 27 10 6.82 8.08 14.90
    1.0 11.87 12.54 24.41
    1.2 14.24 15.04 29.28
    下载: 导出CSV

    表  5  单一式与分块存储式片上缓冲存储器存储不同卷积层数据时的读写动态功耗

    片上缓存策略 温度( °C) 时钟频率(MHz) 电源电压(V) 卷积层 写操作动态功耗(μW) 读操作动态功耗(μW) 总动态功耗(μW)
    单一式 27 10 1.2 Conv1 34.78 48.80 83.58
    分块存储 27 10 1.2 Conv1 14.24 15.04 29.28
    Conv2 6.97 6.82 13.79
    下载: 导出CSV

    表  6  本文提出的片上缓冲存储器与其他设计案例的对比

    文献 工艺尺寸(nm) 电源电压(V) 缓存容量(Kb) 读写能耗(fJ) 版图面积(mm2)
    本文 180 1.2 40.5 190.1a 0.960
    [19] 180 1.6 64.0 330.0a 0.690
    [20] 90 0.6 128.0. 126.2a 0.489
    [21] 65 1.0 16.0 32.3a 0.064
    注:a: 单比特读写能耗
    下载: 导出CSV
  • [1] YANG Zhen, ZHANG Jie, JIANG Yunliang, et al. A self-organizing IoT service perception algorithm based on human visual direction-sensitive system[J]. IEEE Internet of Things Journal, 2023, 10(7): 6193–6204. doi: 10.1109/JIOT.2022.3223039.
    [2] RASTOGI A, KUMAR S, AGGARWAL A, et al. IoT-based smart traffic monitoring and control system for urban areas[C]. 2025 Fourth International Conference on Smart Technologies, Communication and Robotics, Sathyamangalam, India, 2025: 1–6. doi: 10.1109/STCR62650.2025.11018946.
    [3] CHEN Jiao, HE Jiayi, CHEN Fangfang, et al. Empowering IoT-based autonomous driving via federated instruction tuning with feature diversity[J]. IEEE Internet of Things Journal, 2025, 12(6): 6095–6108. doi: 10.1109/JIOT.2024.3518615.
    [4] RIZZO L, ZICARI P, CICIRELLI F, et al. A study on consumer-grade EEG headsets in BCI applications[C]. 2024 IEEE Conference on Pervasive and Intelligent Computing, Boracay Island, Philippines, 2024: 67–74. doi: 10.1109/PICom64201.2024.00016.
    [5] YEOLE P, LABADE Y, WABALE N, et al. IoT-enabled smart wearables for improved visual impairment navigation[C]. 2024 International Conference on Decision Aid Sciences and Applications, Manama, Bahrain, 2024: 1–5. doi: 10.1109/DASA63652.2024.10836578.(查阅网上资料,请确认标黄单词是否正确).
    [6] ABBAS N, ZHANG Yan, TAHERKORDI A, et al. Mobile edge computing: A survey[J]. IEEE Internet of Things Journal, 2018, 5(1): 450–465. doi: 10.1109/JIOT.2017.2750180.
    [7] WANG Sai, LI Xiaoyang, and GONG Yi. Energy-efficient task offloading and resource allocation for delay-constrained edge-cloud computing networks[J]. IEEE Transactions on Green Communications and Networking, 2024, 8(1): 514–524. doi: 10.1109/TGCN.2023.3306002.
    [8] LI Ziwei, XU Han, LIU Zheyu, et al. A 2.17μW@120fps ultra-low-power dual-mode CMOS image sensor with senputing architecture[C]. 2022 27th Asia and South Pacific Design Automation Conference, Taipei, China, 2022: 92–93. doi: 10.1109/ASP-DAC52403.2022.9712591.
    [9] PUVIRAJAN T, PAULRAJ R L, KULKARNI S, et al. 6T SRAM: A technical overview[C]. 2023 International Conference on Advances in Electronics, Communication, Computing and Intelligent Information Systems, Bangalore, India, 2023: 698–702. doi: 10.1109/ICAECIS58353.2023.10170407. (查阅网上资料,请确认标黄作者是否正确).
    [10] SINGH T, PRAKASH V, ANWER S S, et al. Analyzing the performance of 6T SRAM cell and 64×64 memory array at lower technology nodes for low power design[C]. 2023 1st International Conference on Circuits, Power and Intelligent Systems, Bhubaneswar, India, 2023: 1–6. doi: 10.1109/CCPIS59145.2023.10291492.
    [11] YU Shimeng, JIANG Hongwu, HUANG Shanshi, et al. Compute-in-memory chips for deep learning: Recent trends and prospects[J]. IEEE Circuits and Systems Magazine, 2021, 21(3): 31–56. doi: 10.1109/MCAS.2021.3092533.
    [12] MITTAL S and VETTER J S. A survey of software techniques for using non-volatile memories for storage and main memory systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2016, 27(5): 1537–1550. doi: 10.1109/TPDS.2015.2442980.
    [13] INCI A, ISGENC M M, and MARCULESCU D. DeepNVM++: Cross-layer modeling and optimization framework of nonvolatile memories for deep learning[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2022, 41(10): 3426–3437. doi: 10.1109/TCAD.2021.3127148.
    [14] RATHI N, KUMAR A, GUPTA N, et al. A review of low-power static random access memory (SRAM) designs[C]. 2023 IEEE Devices for Integrated Circuit, Kalyani, India, 2023: 455–459. doi: 10.1109/DevIC57758.2023.10134887.
    [15] SIMON W A, LEVISSE A, ZAPATER M, et al. A hybrid cache HW/SW stack for optimizing neural network runtime, power and endurance[C]. 2020 IFIP/IEEE 28th International Conference on Very Large Scale Integration, Salt Lake City, USA, 2020: 94–99. doi: 10.1109/VLSI-SOC46417.2020.9344087.
    [16] NAZEMIAN M and SAYEDI S M. Low power SRAM using an optimal number of split bit lines and single-ended sensing[C]. 2023 31st International Conference on Electrical Engineering, Tehran, Islamic Republic of Iran, 2023: 947–950. doi: 10.1109/ICEE59167.2023.10334788.
    [17] NGUYEN D T, BHATTACHARJEE A, MOITRA A, et al. MCAIMem: A mixed SRAM and eDRAM cell for area and energy-efficient on-chip AI memory[J]. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2024, 32(11): 2023–2036. doi: 10.1109/TVLSI.2024.3439231.
    [18] NAZHAMAITI M, JIANG Weihuang, SU Haijin, et al. Selfputing: A 0.57 μW @ 15 fps vision chip with self-powered in-pixel computing and in-memory computing for visual perception on the edge[C]. 2024 IEEE European Solid-State Electronics Research Conference, Bruges, Belgium, 2024: 585–588. doi: 10.1109/ESSERC62670.2024.10719556.
    [19] COSEMANS S, DEHAENE W, and CATTHOOR F. A low-power embedded SRAM for wireless applications[J]. IEEE Journal of Solid-State Circuits, 2007, 42(7): 1607–1617. doi: 10.1109/JSSC.2007.896693.
    [20] COSEMANS S, DEHAENE W, and CATTHOOR F. A 3.6pJ/access 480MHz, 128Kbit on-Chip SRAM with 850MHz boost mode in 90nm CMOS with tunable sense amplifiers to cope with variability[C]. 34th European Solid-State Circuits Conference, Edinburgh, UK, 2008: 278–281. doi: 10.1109/ESSCIRC.2008.4681846.
    [21] CHEN Yuzong, MU Junjie, KIM H, et al. BP-SCIM: A reconfigurable 8T SRAM macro for bit-parallel searching and computing in-memory[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2023, 70(5): 2016–2027. doi: 10.1109/TCSI.2023.3240303.
  • 加载中
图(8) / 表(6)
计量
  • 文章访问数:  27
  • HTML全文浏览量:  8
  • PDF下载量:  3
  • 被引次数: 0
出版历程
  • 收稿日期:  2025-05-27
  • 修回日期:  2025-08-28
  • 网络出版日期:  2025-09-02

目录

    /

    返回文章
    返回