Please wait a minute...
浙江大学学报(工学版)  2026, Vol. 60 Issue (2): 287-302    DOI: 10.3785/j.issn.1008-973X.2026.02.007
计算机技术与控制工程     
数据去重与缩减技术的系统分类与性能分析
奎晓燕1(),张敏1,肖伶1,李钦松2,*(),陈立明3,张文生4,邹北骥1
1. 中南大学 计算机学院,湖南 长沙 410083
2. 中南大学 大数据研究院,湖南 长沙 410083
3. 里昂大学 里昂中央理工学院,奥弗涅-罗讷-阿尔卑斯大区 里昂 69134
4. 中国科学院 自动化研究所,北京 100190
Systematic classification and performance analysis of data deduplication and reduction techniques
Xiaoyan KUI1(),Min ZHANG1,Ling XIAO1,Qinsong LI2,*(),Liming CHEN3,Wensheng ZHANG4,Beiji ZOU1
1. School of Computer Science and Engineering, Central South University, Changsha 410083, China
2. Big Data Institute, Central South University, Changsha 410083, China
3. école Centrale de Lyon, University of Lyon, Lyon 69134, France
4. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
 全文: PDF(872 KB)   HTML
摘要:

深入研究各类数据缩减技术,为存储系统的优化和数据处理的高效性提供有效的解决方案. 结合冗余数据分布特性及不同应用场景,从数据相似性和层次结构出发,将现有数据缩减技术分成4个类别:重复数据缩减、文件间相似缩减、文件内相似缩减和混合缩减. 数据缩减技术对存储系统的存储效率、系统响应时间、数据传输和可靠性有显著影响,分析与总结不同类别数据缩减技术的性能,讨论现有技术的优点和局限性. 介绍数据缩减技术在多个场景的应用,指出未来研究的挑战与方向.

关键词: 数据缩减数据去重数据压缩存储系统可靠性    
Abstract:

Reduction techniques were studied, and effective solutions were provided for optimizing storage systems and improving data processing efficiency. Redundant data distribution characteristics and various application scenarios were combined, and based on data similarity and hierarchical structure, four categories of existing data reduction techniques were identified: duplicate reduction, inter-file similarity reduction, intra-file similarity reduction, and hybrid reduction. Since data reduction techniques significantly impact storage efficiency, system response time, data transmission, and reliability, the performance of different data reduction techniques was analyzed and summarized. The advantages and limitations of existing technologies were discussed. Applications of data reduction techniques in multiple scenarios were introduced, and future challenges and research directions were identified.

Key words: data reduction    data deduplication    data compression    storage system    reliability
收稿日期: 2025-02-20 出版日期: 2026-02-03
CLC:  TP 393  
基金资助: 国家自然科学基金资助项目(U22A2034,62177047,62302530);科技部高端外国专家引进计划项目(国科发专[2023]155号);湖南省科技厅重点研发项目(2024JK2135);湘江实验室重点项目(23XJ02005);湖南省教育厅重点项目(24A0018);湖南省自然科学基金资助项目(2023JJ40769);中南大学前沿交叉项目(2023QYJC020).
通讯作者: 李钦松     E-mail: xykui@csu.edu.cn;qinsli.cg@csu.edu.cn
作者简介: 奎晓燕(1980—),女,教授,博士,从事医学大数据研究. orcid.org/0000-0002-9957-7867. E-mail:xykui@csu.edu.cn
服务  
把本文推荐给朋友
加入引用管理器
E-mail Alert
作者相关文章  
奎晓燕
张敏
肖伶
李钦松
陈立明
张文生
邹北骥

引用本文:

奎晓燕,张敏,肖伶,李钦松,陈立明,张文生,邹北骥. 数据去重与缩减技术的系统分类与性能分析[J]. 浙江大学学报(工学版), 2026, 60(2): 287-302.

Xiaoyan KUI,Min ZHANG,Ling XIAO,Qinsong LI,Liming CHEN,Wensheng ZHANG,Beiji ZOU. Systematic classification and performance analysis of data deduplication and reduction techniques. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 287-302.

链接本文:

https://www.zjujournals.com/eng/CN/10.3785/j.issn.1008-973X.2026.02.007        https://www.zjujournals.com/eng/CN/Y2026/V60/I2/287

类别目标数据粒度扩展性复杂度技术
传统压缩方法所有数据字节、字符串霍夫曼编码、字典编码
差量压缩方法相似数据字节、字符串适中适中字符串匹配
数据去重技术重复数据块、文件
分块、指纹计算
表 1  数据缩减方法比较
图 1  基于数据块级别的冗余分布
图 2  基于哈希函数的数据去重方法
图 3  固定大小块级别数据去重方法
图 4  局部极大值分块算法
图 5  非对称极值分块算法
图 6  快速非对称最大值分块方法
算法块差异吞吐量最小块大小最大块大小
Rabin-CDC取决于限制条件取决于限制条件
LMC$w$$256(w - 1)$
AE$w+1$无限制
RAM一般非常高$w+1$无限制
表 2  不同分块算法比较
图 7  差量压缩方法
图 8  不同数据缩减技术的存储收益
方案关键点优点缺点
Jingwei[50]实现高效自适应的数据去重系统迁移提高存储空间利用率,促进服务适应性较高的计算成本和复杂性,依赖数据块分析和参数调整
DLDAFE[51]双层去重提升率,兼顾性能与块效应动态分块组合降低开销,平衡性能与时间,减弱硬分块影响复杂度增加,须调参数,第二层精确删除在大数据量下成瓶颈
Light-Dedup[52]哈希比对结合,优化I/O实现快速块去重提升I/O性能并节省存储开销,提高数据去重效率内存使用依赖服务器环境,伴随额外索引开销
FSDedup[53]纠错码辅助去重,指纹识别相似数据,消除高引用冗余减少相似比较读取开销,消除更多冗余数据依赖工作负载局部性,伴随额外计算开销
imDedup[54]I/O路径去重提升云主存储系统性能降低延迟影响,灵活阈值设置,优化内存缓存使用动态缓存调整复杂,依赖工作负载特性
表 3  基于数据去重率的方案比较
方案关键点优点缺点
R-dedup[55]缓解固态存储指纹瓶颈,提升数据去重整体效率显著提高安全哈希算法1(SHA-1)吞吐率, 减少计算量, 兼具兼容性与扩展性, 误差小内存开销大,I/O限制性能提升空间
QuickDedup[57]针对云环境中虚拟机磁盘映像的高效数据去重方法时间效率高, 最小化元数据开销, 数据去重速度快, 适用性强固定块大小限制数据类型适配,影响去重率
dCACH[58]优化备份磁盘索引,缓解分布式去重节点孤岛高可扩展性, 高吞吐量, 高存储效率复杂性增加, 资源消耗大, 依赖组件
SACRO[59]布隆过滤器加速所有权验证,实现冗余数据快速检测降低误报率, 提高数据块的局部性依赖文件相似性, 缓存管理复杂
P-Dedupe[60]选择性重写与缓存优化提升局部性,增强恢复性能提升数据去重吞吐量, 并行化内容定义分块和指纹生成对硬件资源需求较高, 不适合所有数据集类型
表 4  基于备份性能的方案比较
方案关键点优点缺点
TRS[61]解决数据去重碎片化,避免恢复时检索冗余容器块显著提升恢复性能, 减少回收元数据开销, 高效利用存储空间依赖历史信息, 利用阈值敏感
FGDEFRAG[62]采用可变大小与自适应定位数据组,精准识别并消除碎片化数据提升恢复性能,减少重写数据,精准识别碎片,适应性高参数敏感,增量性能有限,存储开销增加
MFDedup[63]通过数据分类优化布局,保持备份局部性,缓解去重碎片化问题提升恢复性能, 减少碎片化, 低开销GC过程, 简化管理与空间利用依赖生命周期连续,恢复顺序执行,存在额外开销
表 5  基于恢复性能的方案比较
方案关键点优点缺点
HDS[67]
减少随机访问与并发元数据开销,降低碎片化,提升主存储性能减少元数据开销,提升存储效率,优化I/O性能去重覆盖有限,复杂性高,缓存管理难,扩展受限
FADD[68]利用文件的语义信息(如文件类型和大小)指导数据去重过程针对性强, 减少系统开销, 提升存储性能, 灵活性高初始投入成本及实现复杂度高, 存在安全性问题和隐私问题
BEDD[69]提出小规模平衡化边缘数据去重问题优化解法及大规模次优方案延迟约束下综合优化去重率、存储效益与资源平衡计算开销主要由云服务商承担
CA-Dedupe[70]按写请求内容分类,仅在特定类别中执行去重搜索减少写延迟及内存消耗, 节省存储空间软件复杂性增加, 性能依赖于文件类型
表 6  基于系统开销的方案比较
方案关键点优点缺点
RAD[29]优化GC性能,根据数据块的可靠性需求采用不同的纠删编码方案GC性能、可靠性、存储效率均提升复杂度高,权衡存储与可靠,性能开销大,一致性难保
DARM[86]去重感知冗余管理,利用语义信息提升存储可靠性提升可靠性,降低存储开销实现复杂性, 性能依赖参数的正确配置
ASDDS[88]考虑隐私泄漏与审计伪造,提升数据安全与可信度降低用户的密钥存储成本, 保证密钥的可恢复性和审计结果的可靠性实现复杂性,性能依赖参数的正确配置
DCStore[89]提供数据外包到云端的解决方案,以实现成本效益和高可用性提高可用性,降低存储成本,提高访问性能,增强数据容错能力实现复杂性,依赖云服务提供商,网络带宽限制
RepEC-Duet[90]通过结合数据去重和差量压缩技术,提高存储系统的可靠性和性能存储空间高效利用,数据恢复性能提升,缓存局部性维护复杂性增加,缓存管理挑战,平衡性能与可靠性
RepEC+[91]结合副本与纠删码,历史驱动差量压缩,提升可靠性与恢复效能存储空间高效利用,数据恢复性能提升,减少循环碎片化复杂性增加,缓存管理挑战,平衡恢复性能与存储开销
表 7  数据缩减技术方案比较
1 JIANG P, SINHA S, ALDAPE K, et al Big data in basic and translational cancer research[J]. Nature Reviews Cancer, 2022, 22 (11): 625- 639
doi: 10.1038/s41568-022-00502-0
2 ACCIARINI C, CAPPA F, BOCCARDELLI P, et al How can organizations leverage big data to innovate their business models? A systematic literature review[J]. Technovation, 2023, 123: 102713
doi: 10.1016/j.technovation.2023.102713
3 International Data Corporation. Worldwide IDC global datasphere forecast, 2024–2028: AI everywhere, but upsurge in data will take time [EB/OL]. (2024−05−31)[2025−11−12]. https://my.idc.com/getdoc.jsp?containerId=US52712424&pageType=PRINTFRIENDLY.
4 杜云箫, 陈珂, 寿黎但, 等 LazyStore: 基于混合存储架构的写优化键值存储系统[J]. 软件学报, 2025, 36 (2): 805- 829
DU Yunxiao, CHEN Ke, SHOU Lidan, et al LazyStore: write-optimized key-value storage system based on hybrid storage architecture[J]. Journal of Software, 2025, 36 (2): 805- 829
5 LIU M, PAN L, LIU S Cost optimization for cloud storage from user perspectives: recent advances, taxonomy, and survey[J]. ACM Computing Surveys, 2023, 55 (13s): 1- 37
6 ZHANG T, SHEN J, LAI C F, et al Multi-server assisted data sharing supporting secure deduplication for metaverse healthcare systems[J]. Future Generation Computer Systems, 2023, 140: 299- 310
doi: 10.1016/j.future.2022.10.031
7 OH M, LEE S, JUST S, et al. TiDedup: a new distributed deduplication architecture for ceph [C]// 2023 USENIX Annual Technical Conference. Boston: [s.n.], 2023: 117–131.
8 LIN L, DENG Y, ZHOU Y, et al InDe: an inline data deduplication approach via adaptive detection of valid container utilization[J]. ACM Transactions on Storage, 2023, 19 (1): 1- 27
9 SHAH M, YU X, DI S, et al. Lightweight Huffman coding for efficient GPU compression [C]// Proceedings of the 37th ACM International Conference on Supercomputing. Orlando: ACM, 2023: 99–110.
10 MING Y, WANG C, LIU H, et al Blockchain-enabled efficient dynamic cross-domain deduplication in edge computing[J]. IEEE Internet of Things Journal, 2022, 9 (17): 15639- 15656
doi: 10.1109/JIOT.2022.3150042
11 XIA W, WEI C, LI Z, et al NetSync: a network adaptive and deduplication-inspired delta synchronization approach for cloud storage services[J]. IEEE Transactions on Parallel and Distributed Systems, 2022, 33 (10): 2554- 2570
12 LIU X, AN P, CHEN Y, et al An improved lossless image compression algorithm based on Huffman coding[J]. Multimedia Tools and Applications, 2022, 81 (4): 4781- 4795
doi: 10.1007/s11042-021-11017-5
13 ZHANG Y, ZHANG F, LI H, et al. CompressStreamDB: fine-grained adaptive stream processing without decompression [C]// Proceedings of the IEEE 39th International Conference on Data Engineering. Anaheim. IEEE, 2023: 408–422.
14 BACS A, MUSAEV S, RAZAVI K, et al. DUPEFS: leaking data over the network with filesystem deduplication side channels [C]// 20th USENIX Conference on File and Storage Technologies. Santa Clara: [s.n.], 2022: 281–296.
15 NI F, LIN X, JIANG S. SS-CDC: a two-stage parallel content-defined chunking for deduplicating backup storage [C]// Proceedings of the 12th ACM International Conference on Systems and Storage. Haifa: ACM, 2019: 86–96.
16 ZHANG Y, FU M, WU X, et al Improving restore performance of packed datasets in deduplication systems via reducing persistent fragmented chunks[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31 (7): 1651- 1664
doi: 10.1109/TPDS.2020.2972898
17 WU S, DU C, ZHANG W, et al DedupHR: exploiting content locality to alleviate read/write interference in deduplication-based flash storage[J]. IEEE Transactions on Computers, 2022, 71 (6): 1332- 1343
18 CAO Z, LIU S, WU F, et al. Sliding look-back window assisted data chunk rewriting for improving deduplication restore performance [C]// Proceedings of the 17th USENIX Conference on File and Storage Technologies. Boston: ACM, 2019: 129–142.
19 ZHANG D, DENG Y, ZHOU Y, et al Improving the performance of deduplication-based backup systems via container utilization based hot fingerprint entry distilling[J]. ACM Transactions on Storage, 2021, 17 (4): 1- 23
20 XU L J, HAO R, YU J, et al Secure deduplication for big data with efficient dynamic ownership updates[J]. Computers and Electrical Engineering, 2021, 96: 107531
doi: 10.1016/j.compeleceng.2021.107531
21 COGO V, PAULO J, BESSANI A GenoDedup: similarity-based deduplication and delta-encoding for genome sequencing data[J]. IEEE Transactions on Computers, 2021, 70 (5): 669- 681
doi: 10.1109/TC.2020.2994774
22 VESTERGAARD R, LUCANI D E, ZHANG Q. A randomly accessible lossless compression scheme for time-series data [C]// Proceedings of the IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. Toronto: IEEE, 2020: 2145–2154.
23 XIA W, PU L, ZOU X, et al The design of fast and lightweight resemblance detection for efficient post-deduplication delta compression[J]. ACM Transactions on Storage, 2023, 19 (3): 1- 30
24 SHARMA G Analysis of Huffman coding and Lempel–Ziv–Welch (LZW) coding as data compression techniques[J]. International Journal of Scientific Research in Computer Science and Engineering, 2020, 8 (1): 37- 44
25 PERIASAMY J K, LATHA B Efficient hash function–based duplication detection algorithm for data deduplication deduction and reduction[J]. Concurrency and Computation: Practice and Experience, 2021, 33 (3): e5213
doi: 10.1002/cpe.5213
26 AHMED S T, GEORGE L E Lightweight hash-based de-duplication system using the self detection of most repeated patterns as chunks divisors[J]. Journal of King Saud University - Computer and Information Sciences, 2022, 34 (7): 4669- 4678
doi: 10.1016/j.jksuci.2021.04.005
27 JIANG T, YUAN X, CHEN Y, et al FuzzyDedup: secure fuzzy deduplication for cloud storage[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20 (3): 2466- 2483
doi: 10.1109/TDSC.2022.3185313
28 LI Y, TIAN C, GUO F, et al. ElasticBF: elastic Bloom filter with hotness awareness for boosting read performance in large key-value stores [C]// 2019 USENIX Annual Technical Conference. Renton: [s.n.], 2019: 739–752.
29 LIU T, HE X, ALIBHAI S, et al. Reference-counter aware deduplication in erasure-coded distributed storage system [C]// Proceedings of the IEEE International Conference on Networking, Architecture and Storage. Chongqing: IEEE, 2018: 1–10.
30 NI F, JIANG S. RapidCDC: leveraging duplicate locality to accelerate chunking in CDC-based deduplication systems [C]// Proceedings of the ACM Symposium on Cloud Computing. Santa Cruz: ACM, 2019: 220–232.
31 ZHANG G, XIE H, YANG Z, et al BDKM: a blockchain-based secure deduplication scheme with reliable key management[J]. Neural Processing Letters, 2022, 54 (4): 2657- 2674
doi: 10.1007/s11063-021-10450-9
32 XIA W, ZHOU Y, JIANG H, et al. FastCDC: a fast and efficient content-defined chunking approach for data deduplication [C]// 2016 USENIX Annual Technical Conference. Denver: [s.n.], 2016: 101–114.
33 JIN X, LIU H, YE C, et al Accelerating content-defined chunking for data deduplication based on speculative jump[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34 (9): 2568- 2579
doi: 10.1109/TPDS.2023.3290770
34 XIA W, ZOU X, JIANG H, et al The design of fast content-defined chunking for data deduplication based storage systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31 (9): 2017- 2031
doi: 10.1109/TPDS.2020.2984632
35 BJØRNER N, BLASS A, GUREVICH Y Content-dependent chunking for differential compression, the local maximum approach[J]. Journal of Computer and System Sciences, 2010, 76 (3/4): 154- 203
36 ZHANG Y, YUAN Y, FENG D, et al Improving restore performance for in-line backup system combining deduplication and delta compression[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31 (10): 2302- 2314
doi: 10.1109/TPDS.2020.2991030
37 GARG S, SINGH R, OBAIDAT M S, et al Statistical vertical reduction-based data abridging technique for big network traffic dataset[J]. International Journal of Communication Systems, 2020, 33 (4): e4249
doi: 10.1002/dac.4249
38 RANJAN R Canonical Huffman coding based image compression using wavelet[J]. Wireless Personal Communications, 2021, 117 (3): 2193- 2206
doi: 10.1007/s11277-020-07967-y
39 HUSSEIN A M, IDREES A K, COUTURIER R A distributed prediction–compression-based mechanism for energy saving in IoT networks[J]. The Journal of Supercomputing, 2023, 79 (15): 16963- 16999
doi: 10.1007/s11227-023-05317-w
40 NIU B, CAO X, WEI Z, et al Entropy optimized deep feature compression[J]. IEEE Signal Processing Letters, 2021, 28: 324- 328
doi: 10.1109/LSP.2021.3052097
41 COLLET Y. Finite state entropy [EB/OL]. [2025–11–13]. https://fastcompression.blogspot.com/2013/
42 WELCH A technique for high-performance data compression[J]. Computer, 1984, 17 (6): 8- 19
43 MAJID A, ROBERTS S G, CILISSEN L, et al Differential coding of perception in the world’s languages[J]. Proceedings of the National Academy of Sciences of the United States of America, 2018, 115 (45): 11369- 11376
44 RAJKUMAR K, HARIHARAN U, DHANAKOTI V, et al A secure framework for managing data in cloud storage using rapid asymmetric maximum based dynamic size chunking and fuzzy logic for deduplication[J]. Wireless Networks, 2024, 30 (1): 321- 334
doi: 10.1007/s11276-023-03448-9
45 LI M, WANG H, YANG L, et al Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction[J]. Expert Systems with Applications, 2020, 150: 113277
doi: 10.1016/j.eswa.2020.113277
46 ZHANG B, WANG C, ZHOU B B, et al DCDedupe: selective deduplication and delta compression with effective routing for distributed storage[J]. Journal of Grid Computing, 2018, 16 (2): 195- 209
doi: 10.1007/s10723-018-9429-3
47 ZHANG Y, JIANG H, FENG D, et al. LoopDelta: embedding locality-aware opportunistic delta compression in inline deduplication for highly efficient data reduction [C]// 2023 USENIX Annual Technical Conference. Boston: [s.n.], 2023: 133–148.
48 TAN H, ZHANG Z, ZOU X, et al. Exploring the potential of fast delta encoding: marching to a higher compression ratio [C]// Proceedings of the IEEE International Conference on Cluster Computing. Kobe: IEEE, 2020: 198–208.
49 XIA J, CHENG G, LUO L, et al The doctrine of MEAN: realizing deduplication storage at unreliable edge[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34 (10): 2811- 2826
doi: 10.1109/TPDS.2023.3305460
50 CHENG G, GUO D, LUO L, et al. Jingwei: an efficient and adaptable data migration strategy for deduplicated storage systems [C]// Proceedings of the IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. London: IEEE, 2022: 1659–1668.
51 王青松, 葛慧 指纹极值的双层重复数据删除算法[J]. 辽宁大学学报: 自然科学版, 2018, 45 (3): 201- 207
WANG Qingsong, GE Hui Double layer deduplication algorithm based on fingerprint extremum[J]. Journal of Liaoning University: Natural Sciences Edition, 2018, 45 (3): 201- 207
52 QIU J, PAN Y, XIA W, et al. Light-Dedup: a light-weight inline deduplication framework for non-volatile memory file systems [C]// 2023 USENIX Annual Technical Conference. Boston: [s.n.], 2023: 101−116.
53 DU C, LIN Z, WU S, et al FSDedup: feature-aware and selective deduplication for improving performance of encrypted non-volatile main memory[J]. ACM Transactions on Storage, 2024, 20 (4): 1- 33
54 DENG C, CHEN Q, ZOU X, et al. imDedup: a lossless deduplication scheme to eliminate fine-grained redundancy among images [C]// Proceedings of the IEEE 38th International Conference on Data Engineering. Kuala Lumpur: IEEE, 2022: 1071–1084.
55 王龙翔, 董凯, 王鹏博, 等 R-dedup: 一种重复数据删除指纹计算的优化方法[J]. 西安交通大学学报, 2021, 55 (1): 43- 51
WANG Longxiang, DONG Kai, WANG Pengbo, et al R-dedup: a performance improvement strategy for fingerprint calculation of data de-duplication[J]. Journal of Xi’an Jiaotong University, 2021, 55 (1): 43- 51
56 XIANG L, ZHAO X, RAO J, et al. Characterizing the performance of intel optane persistent memory: a close look at its on-DIMM buffering [C]// Proceedings of the Seventeenth European Conference on Computer Systems. Rennes: ACM, 2022: 488–505.
57 SAHARAN S, SOMANI G, GUPTA G, et al QuickDedup: efficient VM deduplication in cloud computing environments[J]. Journal of Parallel and Distributed Computing, 2020, 139: 18- 31
doi: 10.1016/j.jpdc.2020.01.002
58 DAGNAW G, ZHOU K, WANG H dCACH: content aware clustered and hierarchical distributed deduplication[J]. Journal of Software Engineering and Applications, 2019, 12 (11): 460- 490
doi: 10.4236/jsea.2019.1211029
59 DAGNAW G, ZHOU K, WANG H SACRO: solid state drive-assisted chunk caching for restore optimization[J]. Concurrency and Computation: Practice and Experience, 2023, 35 (18): e6162
doi: 10.1002/cpe.6162
60 XIA W, FENG D, JIANG H, et al Accelerating content-defined-chunking based data deduplication by exploiting parallelism[J]. Future Generation Computer Systems, 2019, 98: 406- 418
doi: 10.1016/j.future.2019.02.008
61 LIN L, DENG Y, ZHOU Y. Improving restore performance of deduplication systems via a greedy rewriting scheme [C]// Proceedings of the IEEE 27th International Conference on Parallel and Distributed Systems. Beijing: IEEE, 2022: 291–298.
62 TAN Y, WANG B, WEN J, et al Improving restore performance in deduplication-based backup systems via a fine-grained defragmentation approach[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29 (10): 2254- 2267
doi: 10.1109/TPDS.2018.2828842
63 ZOU X, YUAN J, SHILANE P, et al From hyper-dimensional structures to linear structures: maintaining deduplicated data’s locality[J]. ACM Transactions on Storage, 2022, 18 (3): 1- 28
64 ZOU X, YUAN J, SHILANE P, et al. The dilemma between deduplication and locality: can both be achieved [C]// 19th USENIX Conference on File and Storage Technologies (FAST 21). [S.l.]: USENIX Association, 2021: 171−185.
65 XIAO L, ZOU B, ZHU C, et al ESDedup: an efficient and secure deduplication scheme based on data similarity and blockchain for cloud-assisted medical storage systems[J]. The Journal of Supercomputing, 2023, 79 (3): 2932- 2960
doi: 10.1007/s11227-022-04746-3
66 LIU J, CHAI Y P, QIN X, et al Endurable SSD-based read cache for improving the performance of selective restore from deduplication systems[J]. Journal of Computer Science and Technology, 2018, 33 (1): 58- 78
doi: 10.1007/s11390-018-1808-5
67 GODAVARI A, SUDHAKAR C, RAMESH T Hybrid deduplication system: a block-level similarity-based approach[J]. IEEE Systems Journal, 2021, 15 (3): 3860- 3870
doi: 10.1109/JSYST.2020.3012702
68 GODAVARI A, SUDHAKAR C, RAMESH T File semantic aware primary storage deduplication system[J]. IETE Journal of Research, 2023, 69 (11): 7945- 7957
doi: 10.1080/03772063.2022.2050306
69 LUO R, JIN H, HE Q, et al Enabling balanced data deduplication in mobile edge computing[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34 (5): 1420- 1431
doi: 10.1109/TPDS.2023.3247061
70 GHOLAMI TAGHIZADEH R, GHOLAMI TAGHIZADEH R, KHAKPASH F, et al CA-Dedupe: content-aware deduplication in SSDs[J]. The Journal of Supercomputing, 2020, 76 (11): 8901- 8921
doi: 10.1007/s11227-020-03188-z
71 BRODER A Z. Identifying and filtering near-duplicate documents [C]// Annual Symposium on Combinatorial Pattern Matching. Berlin: Springer, 2000: 1–10.
72 WU S, TU Z, ZHOU Y, et al FASTSync: a FAST delta sync scheme for encrypted cloud storage in high-bandwidth network environments[J]. ACM Transactions on Storage, 2023, 19 (4): 1- 22
73 BEZALELI D, GUTMAN J, NOSSENSON R. Using the ZDelta compression algorithm for data reduction in cellular networks [C]// Proceedings of the Future Network and Mobile Summit. Lisboa: IEEE, 2013: 1–7.
74 BETTINI L, DI SALLE A, IOVINO L, et al Supporting reusable model migration with Edelta[J]. Journal of Systems and Software, 2024, 212: 112012
doi: 10.1016/j.jss.2024.112012
75 TAN H, ZOU X, WAN B, et al. SuperDelta: multiple referenced base chunks scheme for fine-grained deduplication backup storage system [C]// Proceedings of the Data Compression Conference. Snowbird: IEEE, 2024: 362–371.
76 ZHANG Y, XIA W, FENG D, et al. Finesse: fine-grained feature locality based fast resemblance detection for post-deduplication delta compression [C]// 17th USENIX Conference on File and Storage Technologies (FAST 19). Boston: [s.n.], 2019: 121–128.
77 ZOU X, DENG C, XIA W, et al. Odess: speeding up resemblance detection for redundancy elimination by fast content-defined sampling [C]// Proceedings of the IEEE 37th International Conference on Data Engineering. Chania: IEEE, 2021: 480–491.
78 HUANG H, WANG P, SU Q, et al. Palantir: hierarchical similarity detection for post-deduplication delta compression [C]// Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. La Jolla: ACM, 2024: 830–845.
79 PARK J, KIM J, KIM Y, et al. DeepSketch: a new machine learning-based reference search technique for post-deduplication delta compression [EB/OL]. (2022−02−17)[2025−11−13]. https://arxiv.org/pdf/2202.10584.
80 ZOU X, XIA W, SHILANE P, et al. Building a high-performance fine-grained deduplication framework for backup storage with high deduplication ratio [C]// Proceedings of the USENIX Annual Technical Conference. Carlsbad: [s.n.], 2022: 19−36.
81 CHENG W, ZHENG T, ZENG L, et al. DPLFS: a dual-mode PCM-based log-structured file system [C]// Proceedings of the IEEE 40th International Conference on Computer Design. Olympic Valley: IEEE, 2022: 324–331.
82 AJDARI M, RAAF P, KISHANI M, et al An enterprise-grade open-source data reduction architecture for all-flash storage systems[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2022, 6 (2): 1- 27
83 ELLAPPAN M, ABIRAMI S Dynamic prime chunking algorithm for data deduplication in cloud storage[J]. KSII Transactions on Internet and Information Systems (TIIS), 2021, 15 (4): 1342- 1359
84 FU Y, XIAO N, CHEN T, et al Fog-to-MultiCloud cooperative eHealth data management with application-aware secure deduplication[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19 (5): 3136- 3148
doi: 10.1109/TDSC.2021.3086089
85 WU S, MAO B, JIANG H, et al PFP: improving the reliability of deduplication-based storage systems with per-file parity[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 30 (9): 2117- 2129
doi: 10.1109/TPDS.2019.2898942
86 ZHOU Y, FENG D, XIA W, et al. DARM: a deduplication-aware redundancy management approach for reliable-enhanced storage systems [C]// Algorithms and Architectures for Parallel Processing. [S.l.]: Springer, 2018: 445–461.
87 KAN G, JIN C, ZHU H, et al An identity-based proxy re-encryption for data deduplication in cloud[J]. Journal of Systems Architecture, 2021, 121: 102332
doi: 10.1016/j.sysarc.2021.102332
88 SONG M, HUA Z, ZHENG Y, et al Blockchain-based deduplication and integrity auditing over encrypted cloud storage[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20 (6): 4928- 4945
doi: 10.1109/TDSC.2023.3237221
89 AN B, LI Y, MA J, et al. DCStore: a deduplication-based cloud-of-clouds storage service [C]// Proceedings of the IEEE International Conference on Web Services. Milan: IEEE, 2019: 291–295.
90 ZUO C, WANG F, HUANG P, et al. RepEC-Duet: ensure high reliability and performance for deduplicated and delta-compressed storage systems [C]// Proceedings of the IEEE 37th International Conference on Computer Design. Abu Dhabi: IEEE, 2020: 190–198.
91 ZUO C, WANG F, ZHENG M, et al Ensuring high reliability and performance with low space overhead for deduplicated and delta-compressed storage systems[J]. Concurrency and Computation: Practice and Experience, 2022, 34 (5): e6706
doi: 10.1002/cpe.6706
92 MENG L, GONG X, CHEN Y BAD-FM: backdoor attacks against factorization-machine based neural network for tabular data prediction[J]. Chinese Journal of Electronics, 2024, 33 (4): 1077- 1092
doi: 10.23919/cje.2023.00.041
93 HUANG P, WU Y Teacher-student training approach using an adaptive gain mask for LSTM-based speech enhancement in the airborne noise environment[J]. Chinese Journal of Electronics, 2023, 32 (4): 882- 895
doi: 10.23919/cje.2022.00.307
94 ZHANG R, E H, YUAN L, et al FGM-SPCL: open-set recognition network for medical images based on fine-grained data mixture and spatial position constraint loss[J]. Chinese Journal of Electronics, 2024, 33 (4): 1023- 1033
doi: 10.23919/cje.2023.00.081
95 ZOU B, YANG K, KUI X, et al Anomaly detection for streaming data based on grid-clustering and Gaussian distribution[J]. Information Sciences, 2023, 638: 118989
doi: 10.1016/j.ins.2023.118989
96 SUN T, JIANG B, LI B, et al. SimEnc: a high-performance similarity-preserving encryption approach for deduplication of encrypted Docker images [C]// 2024 USENIX Annual Technical Conference (USENIX ATC 24). Santa Clara: [s.n.], 2024: 615–630.
97 LIN Y, MAO Y, ZHANG Y, et al Secure deduplication schemes for content delivery in mobile edge computing[J]. Computers and Security, 2022, 114: 102602
98 XIAO W, HAO Y, LIANG J, et al Adaptive compression offloading and resource allocation for edge vision computing[J]. IEEE Transactions on Cognitive Communications and Networking, 2024, 10 (6): 2357- 2369
doi: 10.1109/TCCN.2024.3400820
99 CHEN L, GUO C, GONG B, et al A secure cross-domain authentication scheme based on threshold signature for MEC[J]. Journal of Cloud Computing, 2024, 13 (1): 70
doi: 10.1186/s13677-024-00631-x
[1] 陈爽,李仕华,孙静. 基于精度的球铰模糊可靠性寿命模型构建及评估方法[J]. 浙江大学学报(工学版), 2025, 59(3): 626-634.
[2] 刘帅帅,王静,刘哲,徐忠环. 基于正交拉丁方的局部修复码构造[J]. 浙江大学学报(工学版), 2024, 58(3): 501-509.
[3] 刘晓航,郑山锁,黄瑜,董淑卿,杨丰,董晋琦. 基于邻接矩阵法的变电站系统抗震可靠性分析[J]. 浙江大学学报(工学版), 2022, 56(8): 1495-1503.
[4] 张昕莹,陈璐,杨雯惠. 考虑系统时变效应与预防性维护的平行机调度[J]. 浙江大学学报(工学版), 2022, 56(2): 408-418.
[5] 陈伟航,罗强,王腾飞,张文生,蒋良潍. DMC复合地基工后沉降可靠性分析及设计优化[J]. 浙江大学学报(工学版), 2022, 56(10): 2019-2027.
[6] 龙立,郑山锁,周炎,贺金川,孟宏立,蔡永龙. 基于拟蒙特卡罗方法的供水管网抗震可靠性分析并行化研究[J]. 浙江大学学报(工学版), 2020, 54(2): 241-247.
[7] 张航, 李洪双. 结构可靠性分析的LCVT-SVR方法[J]. 浙江大学学报(工学版), 2018, 52(10): 2035-2042.
[8] 齐小刚, 王振宇, 刘立芳, 刘兴成, 马久龙. 无线传感器和执行器网络可靠高效路由[J]. 浙江大学学报(工学版), 2018, 52(10): 1964-1972.
[9] 王伟, 王进, 陆国栋. 基于四阶矩估计的机器人运动可靠性分析[J]. 浙江大学学报(工学版), 2018, 52(1): 1-7.
[10] 李冰, 金涛, 陈帅. 提高SRAM PUFs密钥生成可靠性的方法[J]. 浙江大学学报(工学版), 2018, 52(1): 133-141.
[11] 苏星, 王慧泉, 金仲和. 实时高可靠综合电子系统的逻辑架构设计[J]. 浙江大学学报(工学版), 2017, 51(3): 628-636.
[12] 李清,胡志华. 基于多目标遗传算法的灾后可靠路径选择[J]. 浙江大学学报(工学版), 2016, 50(1): 33-40.
[13] 蒋正文, 万水, 李明鸿, 马磊. 结构可靠度分析中的混合模拟法及应用[J]. 浙江大学学报(工学版), 2015, 49(4): 782-791.
[14] 何忠华,袁一星. 基于剩余能量熵的供水管网可靠性优化设计[J]. 浙江大学学报(工学版), 2014, 48(7): 1188-1194.
[15] 杨文彬, 胡军科, 王子坡. 两级双向液压同步控制系统动态特性仿真[J]. 浙江大学学报(工学版), 2014, 48(6): 1107-1113.