Please wait a minute...
Journal of ZheJiang University (Engineering Science)  2026, Vol. 60 Issue (2): 287-302    DOI: 10.3785/j.issn.1008-973X.2026.02.007
    
Systematic classification and performance analysis of data deduplication and reduction techniques
Xiaoyan KUI1(),Min ZHANG1,Ling XIAO1,Qinsong LI2,*(),Liming CHEN3,Wensheng ZHANG4,Beiji ZOU1
1. School of Computer Science and Engineering, Central South University, Changsha 410083, China
2. Big Data Institute, Central South University, Changsha 410083, China
3. école Centrale de Lyon, University of Lyon, Lyon 69134, France
4. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Download: HTML     PDF(872KB) HTML
Export: BibTeX | EndNote (RIS)      

Abstract  

Reduction techniques were studied, and effective solutions were provided for optimizing storage systems and improving data processing efficiency. Redundant data distribution characteristics and various application scenarios were combined, and based on data similarity and hierarchical structure, four categories of existing data reduction techniques were identified: duplicate reduction, inter-file similarity reduction, intra-file similarity reduction, and hybrid reduction. Since data reduction techniques significantly impact storage efficiency, system response time, data transmission, and reliability, the performance of different data reduction techniques was analyzed and summarized. The advantages and limitations of existing technologies were discussed. Applications of data reduction techniques in multiple scenarios were introduced, and future challenges and research directions were identified.



Key wordsdata reduction      data deduplication      data compression      storage system      reliability     
Received: 20 February 2025      Published: 03 February 2026
CLC:  TP 393  
Fund:  国家自然科学基金资助项目(U22A2034,62177047,62302530);科技部高端外国专家引进计划项目(国科发专[2023]155号);湖南省科技厅重点研发项目(2024JK2135);湘江实验室重点项目(23XJ02005);湖南省教育厅重点项目(24A0018);湖南省自然科学基金资助项目(2023JJ40769);中南大学前沿交叉项目(2023QYJC020).
Corresponding Authors: Qinsong LI     E-mail: xykui@csu.edu.cn;qinsli.cg@csu.edu.cn
Cite this article:

Xiaoyan KUI,Min ZHANG,Ling XIAO,Qinsong LI,Liming CHEN,Wensheng ZHANG,Beiji ZOU. Systematic classification and performance analysis of data deduplication and reduction techniques. Journal of ZheJiang University (Engineering Science), 2026, 60(2): 287-302.

URL:

https://www.zjujournals.com/eng/10.3785/j.issn.1008-973X.2026.02.007     OR     https://www.zjujournals.com/eng/Y2026/V60/I2/287


数据去重与缩减技术的系统分类与性能分析

深入研究各类数据缩减技术,为存储系统的优化和数据处理的高效性提供有效的解决方案. 结合冗余数据分布特性及不同应用场景,从数据相似性和层次结构出发,将现有数据缩减技术分成4个类别:重复数据缩减、文件间相似缩减、文件内相似缩减和混合缩减. 数据缩减技术对存储系统的存储效率、系统响应时间、数据传输和可靠性有显著影响,分析与总结不同类别数据缩减技术的性能,讨论现有技术的优点和局限性. 介绍数据缩减技术在多个场景的应用,指出未来研究的挑战与方向.


关键词: 数据缩减,  数据去重,  数据压缩,  存储系统,  可靠性 
类别目标数据粒度扩展性复杂度技术
传统压缩方法所有数据字节、字符串霍夫曼编码、字典编码
差量压缩方法相似数据字节、字符串适中适中字符串匹配
数据去重技术重复数据块、文件
分块、指纹计算
Tab.1 Comparison of data reduction methods
Fig.1 Redundancy distribution at data block level
Fig.2 Deduplication based on hash functions
Fig.3 Fixed-size block-level deduplication
Fig.4 Algorithm of local maximum chunking
Fig.5 Algorithm of asymmetric extremum chunking
Fig.6 Algorithm of rapid asymmetric maximum chunking
算法块差异吞吐量最小块大小最大块大小
Rabin-CDC取决于限制条件取决于限制条件
LMC$w$$256(w - 1)$
AE$w+1$无限制
RAM一般非常高$w+1$无限制
Tab.2 Comparison of different chunking algorithms
Fig.7 Method of differential compression
Fig.8 Storage benefits of various data reduction techniques
方案关键点优点缺点
Jingwei[50]实现高效自适应的数据去重系统迁移提高存储空间利用率,促进服务适应性较高的计算成本和复杂性,依赖数据块分析和参数调整
DLDAFE[51]双层去重提升率,兼顾性能与块效应动态分块组合降低开销,平衡性能与时间,减弱硬分块影响复杂度增加,须调参数,第二层精确删除在大数据量下成瓶颈
Light-Dedup[52]哈希比对结合,优化I/O实现快速块去重提升I/O性能并节省存储开销,提高数据去重效率内存使用依赖服务器环境,伴随额外索引开销
FSDedup[53]纠错码辅助去重,指纹识别相似数据,消除高引用冗余减少相似比较读取开销,消除更多冗余数据依赖工作负载局部性,伴随额外计算开销
imDedup[54]I/O路径去重提升云主存储系统性能降低延迟影响,灵活阈值设置,优化内存缓存使用动态缓存调整复杂,依赖工作负载特性
Tab.3 Comparison of schemes based on deduplication rate
方案关键点优点缺点
R-dedup[55]缓解固态存储指纹瓶颈,提升数据去重整体效率显著提高安全哈希算法1(SHA-1)吞吐率, 减少计算量, 兼具兼容性与扩展性, 误差小内存开销大,I/O限制性能提升空间
QuickDedup[57]针对云环境中虚拟机磁盘映像的高效数据去重方法时间效率高, 最小化元数据开销, 数据去重速度快, 适用性强固定块大小限制数据类型适配,影响去重率
dCACH[58]优化备份磁盘索引,缓解分布式去重节点孤岛高可扩展性, 高吞吐量, 高存储效率复杂性增加, 资源消耗大, 依赖组件
SACRO[59]布隆过滤器加速所有权验证,实现冗余数据快速检测降低误报率, 提高数据块的局部性依赖文件相似性, 缓存管理复杂
P-Dedupe[60]选择性重写与缓存优化提升局部性,增强恢复性能提升数据去重吞吐量, 并行化内容定义分块和指纹生成对硬件资源需求较高, 不适合所有数据集类型
Tab.4 Comparison of schemes based on backup performance
方案关键点优点缺点
TRS[61]解决数据去重碎片化,避免恢复时检索冗余容器块显著提升恢复性能, 减少回收元数据开销, 高效利用存储空间依赖历史信息, 利用阈值敏感
FGDEFRAG[62]采用可变大小与自适应定位数据组,精准识别并消除碎片化数据提升恢复性能,减少重写数据,精准识别碎片,适应性高参数敏感,增量性能有限,存储开销增加
MFDedup[63]通过数据分类优化布局,保持备份局部性,缓解去重碎片化问题提升恢复性能, 减少碎片化, 低开销GC过程, 简化管理与空间利用依赖生命周期连续,恢复顺序执行,存在额外开销
Tab.5 Comparison of schemes based on recovery performance
方案关键点优点缺点
HDS[67]
减少随机访问与并发元数据开销,降低碎片化,提升主存储性能减少元数据开销,提升存储效率,优化I/O性能去重覆盖有限,复杂性高,缓存管理难,扩展受限
FADD[68]利用文件的语义信息(如文件类型和大小)指导数据去重过程针对性强, 减少系统开销, 提升存储性能, 灵活性高初始投入成本及实现复杂度高, 存在安全性问题和隐私问题
BEDD[69]提出小规模平衡化边缘数据去重问题优化解法及大规模次优方案延迟约束下综合优化去重率、存储效益与资源平衡计算开销主要由云服务商承担
CA-Dedupe[70]按写请求内容分类,仅在特定类别中执行去重搜索减少写延迟及内存消耗, 节省存储空间软件复杂性增加, 性能依赖于文件类型
Tab.6 Comparison of schemes based on system overhead
方案关键点优点缺点
RAD[29]优化GC性能,根据数据块的可靠性需求采用不同的纠删编码方案GC性能、可靠性、存储效率均提升复杂度高,权衡存储与可靠,性能开销大,一致性难保
DARM[86]去重感知冗余管理,利用语义信息提升存储可靠性提升可靠性,降低存储开销实现复杂性, 性能依赖参数的正确配置
ASDDS[88]考虑隐私泄漏与审计伪造,提升数据安全与可信度降低用户的密钥存储成本, 保证密钥的可恢复性和审计结果的可靠性实现复杂性,性能依赖参数的正确配置
DCStore[89]提供数据外包到云端的解决方案,以实现成本效益和高可用性提高可用性,降低存储成本,提高访问性能,增强数据容错能力实现复杂性,依赖云服务提供商,网络带宽限制
RepEC-Duet[90]通过结合数据去重和差量压缩技术,提高存储系统的可靠性和性能存储空间高效利用,数据恢复性能提升,缓存局部性维护复杂性增加,缓存管理挑战,平衡性能与可靠性
RepEC+[91]结合副本与纠删码,历史驱动差量压缩,提升可靠性与恢复效能存储空间高效利用,数据恢复性能提升,减少循环碎片化复杂性增加,缓存管理挑战,平衡恢复性能与存储开销
Tab.7 Comparison of data reduction technology schemes
[1]   JIANG P, SINHA S, ALDAPE K, et al Big data in basic and translational cancer research[J]. Nature Reviews Cancer, 2022, 22 (11): 625- 639
doi: 10.1038/s41568-022-00502-0
[2]   ACCIARINI C, CAPPA F, BOCCARDELLI P, et al How can organizations leverage big data to innovate their business models? A systematic literature review[J]. Technovation, 2023, 123: 102713
doi: 10.1016/j.technovation.2023.102713
[3]   International Data Corporation. Worldwide IDC global datasphere forecast, 2024–2028: AI everywhere, but upsurge in data will take time [EB/OL]. (2024−05−31)[2025−11−12]. https://my.idc.com/getdoc.jsp?containerId=US52712424&pageType=PRINTFRIENDLY.
[4]   杜云箫, 陈珂, 寿黎但, 等 LazyStore: 基于混合存储架构的写优化键值存储系统[J]. 软件学报, 2025, 36 (2): 805- 829
DU Yunxiao, CHEN Ke, SHOU Lidan, et al LazyStore: write-optimized key-value storage system based on hybrid storage architecture[J]. Journal of Software, 2025, 36 (2): 805- 829
[5]   LIU M, PAN L, LIU S Cost optimization for cloud storage from user perspectives: recent advances, taxonomy, and survey[J]. ACM Computing Surveys, 2023, 55 (13s): 1- 37
[6]   ZHANG T, SHEN J, LAI C F, et al Multi-server assisted data sharing supporting secure deduplication for metaverse healthcare systems[J]. Future Generation Computer Systems, 2023, 140: 299- 310
doi: 10.1016/j.future.2022.10.031
[7]   OH M, LEE S, JUST S, et al. TiDedup: a new distributed deduplication architecture for ceph [C]// 2023 USENIX Annual Technical Conference. Boston: [s.n.], 2023: 117–131.
[8]   LIN L, DENG Y, ZHOU Y, et al InDe: an inline data deduplication approach via adaptive detection of valid container utilization[J]. ACM Transactions on Storage, 2023, 19 (1): 1- 27
[9]   SHAH M, YU X, DI S, et al. Lightweight Huffman coding for efficient GPU compression [C]// Proceedings of the 37th ACM International Conference on Supercomputing. Orlando: ACM, 2023: 99–110.
[10]   MING Y, WANG C, LIU H, et al Blockchain-enabled efficient dynamic cross-domain deduplication in edge computing[J]. IEEE Internet of Things Journal, 2022, 9 (17): 15639- 15656
doi: 10.1109/JIOT.2022.3150042
[11]   XIA W, WEI C, LI Z, et al NetSync: a network adaptive and deduplication-inspired delta synchronization approach for cloud storage services[J]. IEEE Transactions on Parallel and Distributed Systems, 2022, 33 (10): 2554- 2570
[12]   LIU X, AN P, CHEN Y, et al An improved lossless image compression algorithm based on Huffman coding[J]. Multimedia Tools and Applications, 2022, 81 (4): 4781- 4795
doi: 10.1007/s11042-021-11017-5
[13]   ZHANG Y, ZHANG F, LI H, et al. CompressStreamDB: fine-grained adaptive stream processing without decompression [C]// Proceedings of the IEEE 39th International Conference on Data Engineering. Anaheim. IEEE, 2023: 408–422.
[14]   BACS A, MUSAEV S, RAZAVI K, et al. DUPEFS: leaking data over the network with filesystem deduplication side channels [C]// 20th USENIX Conference on File and Storage Technologies. Santa Clara: [s.n.], 2022: 281–296.
[15]   NI F, LIN X, JIANG S. SS-CDC: a two-stage parallel content-defined chunking for deduplicating backup storage [C]// Proceedings of the 12th ACM International Conference on Systems and Storage. Haifa: ACM, 2019: 86–96.
[16]   ZHANG Y, FU M, WU X, et al Improving restore performance of packed datasets in deduplication systems via reducing persistent fragmented chunks[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31 (7): 1651- 1664
doi: 10.1109/TPDS.2020.2972898
[17]   WU S, DU C, ZHANG W, et al DedupHR: exploiting content locality to alleviate read/write interference in deduplication-based flash storage[J]. IEEE Transactions on Computers, 2022, 71 (6): 1332- 1343
[18]   CAO Z, LIU S, WU F, et al. Sliding look-back window assisted data chunk rewriting for improving deduplication restore performance [C]// Proceedings of the 17th USENIX Conference on File and Storage Technologies. Boston: ACM, 2019: 129–142.
[19]   ZHANG D, DENG Y, ZHOU Y, et al Improving the performance of deduplication-based backup systems via container utilization based hot fingerprint entry distilling[J]. ACM Transactions on Storage, 2021, 17 (4): 1- 23
[20]   XU L J, HAO R, YU J, et al Secure deduplication for big data with efficient dynamic ownership updates[J]. Computers and Electrical Engineering, 2021, 96: 107531
doi: 10.1016/j.compeleceng.2021.107531
[21]   COGO V, PAULO J, BESSANI A GenoDedup: similarity-based deduplication and delta-encoding for genome sequencing data[J]. IEEE Transactions on Computers, 2021, 70 (5): 669- 681
doi: 10.1109/TC.2020.2994774
[22]   VESTERGAARD R, LUCANI D E, ZHANG Q. A randomly accessible lossless compression scheme for time-series data [C]// Proceedings of the IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. Toronto: IEEE, 2020: 2145–2154.
[23]   XIA W, PU L, ZOU X, et al The design of fast and lightweight resemblance detection for efficient post-deduplication delta compression[J]. ACM Transactions on Storage, 2023, 19 (3): 1- 30
[24]   SHARMA G Analysis of Huffman coding and Lempel–Ziv–Welch (LZW) coding as data compression techniques[J]. International Journal of Scientific Research in Computer Science and Engineering, 2020, 8 (1): 37- 44
[25]   PERIASAMY J K, LATHA B Efficient hash function–based duplication detection algorithm for data deduplication deduction and reduction[J]. Concurrency and Computation: Practice and Experience, 2021, 33 (3): e5213
doi: 10.1002/cpe.5213
[26]   AHMED S T, GEORGE L E Lightweight hash-based de-duplication system using the self detection of most repeated patterns as chunks divisors[J]. Journal of King Saud University - Computer and Information Sciences, 2022, 34 (7): 4669- 4678
doi: 10.1016/j.jksuci.2021.04.005
[27]   JIANG T, YUAN X, CHEN Y, et al FuzzyDedup: secure fuzzy deduplication for cloud storage[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20 (3): 2466- 2483
doi: 10.1109/TDSC.2022.3185313
[28]   LI Y, TIAN C, GUO F, et al. ElasticBF: elastic Bloom filter with hotness awareness for boosting read performance in large key-value stores [C]// 2019 USENIX Annual Technical Conference. Renton: [s.n.], 2019: 739–752.
[29]   LIU T, HE X, ALIBHAI S, et al. Reference-counter aware deduplication in erasure-coded distributed storage system [C]// Proceedings of the IEEE International Conference on Networking, Architecture and Storage. Chongqing: IEEE, 2018: 1–10.
[30]   NI F, JIANG S. RapidCDC: leveraging duplicate locality to accelerate chunking in CDC-based deduplication systems [C]// Proceedings of the ACM Symposium on Cloud Computing. Santa Cruz: ACM, 2019: 220–232.
[31]   ZHANG G, XIE H, YANG Z, et al BDKM: a blockchain-based secure deduplication scheme with reliable key management[J]. Neural Processing Letters, 2022, 54 (4): 2657- 2674
doi: 10.1007/s11063-021-10450-9
[32]   XIA W, ZHOU Y, JIANG H, et al. FastCDC: a fast and efficient content-defined chunking approach for data deduplication [C]// 2016 USENIX Annual Technical Conference. Denver: [s.n.], 2016: 101–114.
[33]   JIN X, LIU H, YE C, et al Accelerating content-defined chunking for data deduplication based on speculative jump[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34 (9): 2568- 2579
doi: 10.1109/TPDS.2023.3290770
[34]   XIA W, ZOU X, JIANG H, et al The design of fast content-defined chunking for data deduplication based storage systems[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31 (9): 2017- 2031
doi: 10.1109/TPDS.2020.2984632
[35]   BJØRNER N, BLASS A, GUREVICH Y Content-dependent chunking for differential compression, the local maximum approach[J]. Journal of Computer and System Sciences, 2010, 76 (3/4): 154- 203
[36]   ZHANG Y, YUAN Y, FENG D, et al Improving restore performance for in-line backup system combining deduplication and delta compression[J]. IEEE Transactions on Parallel and Distributed Systems, 2020, 31 (10): 2302- 2314
doi: 10.1109/TPDS.2020.2991030
[37]   GARG S, SINGH R, OBAIDAT M S, et al Statistical vertical reduction-based data abridging technique for big network traffic dataset[J]. International Journal of Communication Systems, 2020, 33 (4): e4249
doi: 10.1002/dac.4249
[38]   RANJAN R Canonical Huffman coding based image compression using wavelet[J]. Wireless Personal Communications, 2021, 117 (3): 2193- 2206
doi: 10.1007/s11277-020-07967-y
[39]   HUSSEIN A M, IDREES A K, COUTURIER R A distributed prediction–compression-based mechanism for energy saving in IoT networks[J]. The Journal of Supercomputing, 2023, 79 (15): 16963- 16999
doi: 10.1007/s11227-023-05317-w
[40]   NIU B, CAO X, WEI Z, et al Entropy optimized deep feature compression[J]. IEEE Signal Processing Letters, 2021, 28: 324- 328
doi: 10.1109/LSP.2021.3052097
[41]   COLLET Y. Finite state entropy [EB/OL]. [2025–11–13]. https://fastcompression.blogspot.com/2013/
[42]   WELCH A technique for high-performance data compression[J]. Computer, 1984, 17 (6): 8- 19
[43]   MAJID A, ROBERTS S G, CILISSEN L, et al Differential coding of perception in the world’s languages[J]. Proceedings of the National Academy of Sciences of the United States of America, 2018, 115 (45): 11369- 11376
[44]   RAJKUMAR K, HARIHARAN U, DHANAKOTI V, et al A secure framework for managing data in cloud storage using rapid asymmetric maximum based dynamic size chunking and fuzzy logic for deduplication[J]. Wireless Networks, 2024, 30 (1): 321- 334
doi: 10.1007/s11276-023-03448-9
[45]   LI M, WANG H, YANG L, et al Fast hybrid dimensionality reduction method for classification based on feature selection and grouped feature extraction[J]. Expert Systems with Applications, 2020, 150: 113277
doi: 10.1016/j.eswa.2020.113277
[46]   ZHANG B, WANG C, ZHOU B B, et al DCDedupe: selective deduplication and delta compression with effective routing for distributed storage[J]. Journal of Grid Computing, 2018, 16 (2): 195- 209
doi: 10.1007/s10723-018-9429-3
[47]   ZHANG Y, JIANG H, FENG D, et al. LoopDelta: embedding locality-aware opportunistic delta compression in inline deduplication for highly efficient data reduction [C]// 2023 USENIX Annual Technical Conference. Boston: [s.n.], 2023: 133–148.
[48]   TAN H, ZHANG Z, ZOU X, et al. Exploring the potential of fast delta encoding: marching to a higher compression ratio [C]// Proceedings of the IEEE International Conference on Cluster Computing. Kobe: IEEE, 2020: 198–208.
[49]   XIA J, CHENG G, LUO L, et al The doctrine of MEAN: realizing deduplication storage at unreliable edge[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34 (10): 2811- 2826
doi: 10.1109/TPDS.2023.3305460
[50]   CHENG G, GUO D, LUO L, et al. Jingwei: an efficient and adaptable data migration strategy for deduplicated storage systems [C]// Proceedings of the IEEE INFOCOM 2022 - IEEE Conference on Computer Communications. London: IEEE, 2022: 1659–1668.
[51]   王青松, 葛慧 指纹极值的双层重复数据删除算法[J]. 辽宁大学学报: 自然科学版, 2018, 45 (3): 201- 207
WANG Qingsong, GE Hui Double layer deduplication algorithm based on fingerprint extremum[J]. Journal of Liaoning University: Natural Sciences Edition, 2018, 45 (3): 201- 207
[52]   QIU J, PAN Y, XIA W, et al. Light-Dedup: a light-weight inline deduplication framework for non-volatile memory file systems [C]// 2023 USENIX Annual Technical Conference. Boston: [s.n.], 2023: 101−116.
[53]   DU C, LIN Z, WU S, et al FSDedup: feature-aware and selective deduplication for improving performance of encrypted non-volatile main memory[J]. ACM Transactions on Storage, 2024, 20 (4): 1- 33
[54]   DENG C, CHEN Q, ZOU X, et al. imDedup: a lossless deduplication scheme to eliminate fine-grained redundancy among images [C]// Proceedings of the IEEE 38th International Conference on Data Engineering. Kuala Lumpur: IEEE, 2022: 1071–1084.
[55]   王龙翔, 董凯, 王鹏博, 等 R-dedup: 一种重复数据删除指纹计算的优化方法[J]. 西安交通大学学报, 2021, 55 (1): 43- 51
WANG Longxiang, DONG Kai, WANG Pengbo, et al R-dedup: a performance improvement strategy for fingerprint calculation of data de-duplication[J]. Journal of Xi’an Jiaotong University, 2021, 55 (1): 43- 51
[56]   XIANG L, ZHAO X, RAO J, et al. Characterizing the performance of intel optane persistent memory: a close look at its on-DIMM buffering [C]// Proceedings of the Seventeenth European Conference on Computer Systems. Rennes: ACM, 2022: 488–505.
[57]   SAHARAN S, SOMANI G, GUPTA G, et al QuickDedup: efficient VM deduplication in cloud computing environments[J]. Journal of Parallel and Distributed Computing, 2020, 139: 18- 31
doi: 10.1016/j.jpdc.2020.01.002
[58]   DAGNAW G, ZHOU K, WANG H dCACH: content aware clustered and hierarchical distributed deduplication[J]. Journal of Software Engineering and Applications, 2019, 12 (11): 460- 490
doi: 10.4236/jsea.2019.1211029
[59]   DAGNAW G, ZHOU K, WANG H SACRO: solid state drive-assisted chunk caching for restore optimization[J]. Concurrency and Computation: Practice and Experience, 2023, 35 (18): e6162
doi: 10.1002/cpe.6162
[60]   XIA W, FENG D, JIANG H, et al Accelerating content-defined-chunking based data deduplication by exploiting parallelism[J]. Future Generation Computer Systems, 2019, 98: 406- 418
doi: 10.1016/j.future.2019.02.008
[61]   LIN L, DENG Y, ZHOU Y. Improving restore performance of deduplication systems via a greedy rewriting scheme [C]// Proceedings of the IEEE 27th International Conference on Parallel and Distributed Systems. Beijing: IEEE, 2022: 291–298.
[62]   TAN Y, WANG B, WEN J, et al Improving restore performance in deduplication-based backup systems via a fine-grained defragmentation approach[J]. IEEE Transactions on Parallel and Distributed Systems, 2018, 29 (10): 2254- 2267
doi: 10.1109/TPDS.2018.2828842
[63]   ZOU X, YUAN J, SHILANE P, et al From hyper-dimensional structures to linear structures: maintaining deduplicated data’s locality[J]. ACM Transactions on Storage, 2022, 18 (3): 1- 28
[64]   ZOU X, YUAN J, SHILANE P, et al. The dilemma between deduplication and locality: can both be achieved [C]// 19th USENIX Conference on File and Storage Technologies (FAST 21). [S.l.]: USENIX Association, 2021: 171−185.
[65]   XIAO L, ZOU B, ZHU C, et al ESDedup: an efficient and secure deduplication scheme based on data similarity and blockchain for cloud-assisted medical storage systems[J]. The Journal of Supercomputing, 2023, 79 (3): 2932- 2960
doi: 10.1007/s11227-022-04746-3
[66]   LIU J, CHAI Y P, QIN X, et al Endurable SSD-based read cache for improving the performance of selective restore from deduplication systems[J]. Journal of Computer Science and Technology, 2018, 33 (1): 58- 78
doi: 10.1007/s11390-018-1808-5
[67]   GODAVARI A, SUDHAKAR C, RAMESH T Hybrid deduplication system: a block-level similarity-based approach[J]. IEEE Systems Journal, 2021, 15 (3): 3860- 3870
doi: 10.1109/JSYST.2020.3012702
[68]   GODAVARI A, SUDHAKAR C, RAMESH T File semantic aware primary storage deduplication system[J]. IETE Journal of Research, 2023, 69 (11): 7945- 7957
doi: 10.1080/03772063.2022.2050306
[69]   LUO R, JIN H, HE Q, et al Enabling balanced data deduplication in mobile edge computing[J]. IEEE Transactions on Parallel and Distributed Systems, 2023, 34 (5): 1420- 1431
doi: 10.1109/TPDS.2023.3247061
[70]   GHOLAMI TAGHIZADEH R, GHOLAMI TAGHIZADEH R, KHAKPASH F, et al CA-Dedupe: content-aware deduplication in SSDs[J]. The Journal of Supercomputing, 2020, 76 (11): 8901- 8921
doi: 10.1007/s11227-020-03188-z
[71]   BRODER A Z. Identifying and filtering near-duplicate documents [C]// Annual Symposium on Combinatorial Pattern Matching. Berlin: Springer, 2000: 1–10.
[72]   WU S, TU Z, ZHOU Y, et al FASTSync: a FAST delta sync scheme for encrypted cloud storage in high-bandwidth network environments[J]. ACM Transactions on Storage, 2023, 19 (4): 1- 22
[73]   BEZALELI D, GUTMAN J, NOSSENSON R. Using the ZDelta compression algorithm for data reduction in cellular networks [C]// Proceedings of the Future Network and Mobile Summit. Lisboa: IEEE, 2013: 1–7.
[74]   BETTINI L, DI SALLE A, IOVINO L, et al Supporting reusable model migration with Edelta[J]. Journal of Systems and Software, 2024, 212: 112012
doi: 10.1016/j.jss.2024.112012
[75]   TAN H, ZOU X, WAN B, et al. SuperDelta: multiple referenced base chunks scheme for fine-grained deduplication backup storage system [C]// Proceedings of the Data Compression Conference. Snowbird: IEEE, 2024: 362–371.
[76]   ZHANG Y, XIA W, FENG D, et al. Finesse: fine-grained feature locality based fast resemblance detection for post-deduplication delta compression [C]// 17th USENIX Conference on File and Storage Technologies (FAST 19). Boston: [s.n.], 2019: 121–128.
[77]   ZOU X, DENG C, XIA W, et al. Odess: speeding up resemblance detection for redundancy elimination by fast content-defined sampling [C]// Proceedings of the IEEE 37th International Conference on Data Engineering. Chania: IEEE, 2021: 480–491.
[78]   HUANG H, WANG P, SU Q, et al. Palantir: hierarchical similarity detection for post-deduplication delta compression [C]// Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. La Jolla: ACM, 2024: 830–845.
[79]   PARK J, KIM J, KIM Y, et al. DeepSketch: a new machine learning-based reference search technique for post-deduplication delta compression [EB/OL]. (2022−02−17)[2025−11−13]. https://arxiv.org/pdf/2202.10584.
[80]   ZOU X, XIA W, SHILANE P, et al. Building a high-performance fine-grained deduplication framework for backup storage with high deduplication ratio [C]// Proceedings of the USENIX Annual Technical Conference. Carlsbad: [s.n.], 2022: 19−36.
[81]   CHENG W, ZHENG T, ZENG L, et al. DPLFS: a dual-mode PCM-based log-structured file system [C]// Proceedings of the IEEE 40th International Conference on Computer Design. Olympic Valley: IEEE, 2022: 324–331.
[82]   AJDARI M, RAAF P, KISHANI M, et al An enterprise-grade open-source data reduction architecture for all-flash storage systems[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2022, 6 (2): 1- 27
[83]   ELLAPPAN M, ABIRAMI S Dynamic prime chunking algorithm for data deduplication in cloud storage[J]. KSII Transactions on Internet and Information Systems (TIIS), 2021, 15 (4): 1342- 1359
[84]   FU Y, XIAO N, CHEN T, et al Fog-to-MultiCloud cooperative eHealth data management with application-aware secure deduplication[J]. IEEE Transactions on Dependable and Secure Computing, 2022, 19 (5): 3136- 3148
doi: 10.1109/TDSC.2021.3086089
[85]   WU S, MAO B, JIANG H, et al PFP: improving the reliability of deduplication-based storage systems with per-file parity[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 30 (9): 2117- 2129
doi: 10.1109/TPDS.2019.2898942
[86]   ZHOU Y, FENG D, XIA W, et al. DARM: a deduplication-aware redundancy management approach for reliable-enhanced storage systems [C]// Algorithms and Architectures for Parallel Processing. [S.l.]: Springer, 2018: 445–461.
[87]   KAN G, JIN C, ZHU H, et al An identity-based proxy re-encryption for data deduplication in cloud[J]. Journal of Systems Architecture, 2021, 121: 102332
doi: 10.1016/j.sysarc.2021.102332
[88]   SONG M, HUA Z, ZHENG Y, et al Blockchain-based deduplication and integrity auditing over encrypted cloud storage[J]. IEEE Transactions on Dependable and Secure Computing, 2023, 20 (6): 4928- 4945
doi: 10.1109/TDSC.2023.3237221
[89]   AN B, LI Y, MA J, et al. DCStore: a deduplication-based cloud-of-clouds storage service [C]// Proceedings of the IEEE International Conference on Web Services. Milan: IEEE, 2019: 291–295.
[90]   ZUO C, WANG F, HUANG P, et al. RepEC-Duet: ensure high reliability and performance for deduplicated and delta-compressed storage systems [C]// Proceedings of the IEEE 37th International Conference on Computer Design. Abu Dhabi: IEEE, 2020: 190–198.
[91]   ZUO C, WANG F, ZHENG M, et al Ensuring high reliability and performance with low space overhead for deduplicated and delta-compressed storage systems[J]. Concurrency and Computation: Practice and Experience, 2022, 34 (5): e6706
doi: 10.1002/cpe.6706
[92]   MENG L, GONG X, CHEN Y BAD-FM: backdoor attacks against factorization-machine based neural network for tabular data prediction[J]. Chinese Journal of Electronics, 2024, 33 (4): 1077- 1092
doi: 10.23919/cje.2023.00.041
[93]   HUANG P, WU Y Teacher-student training approach using an adaptive gain mask for LSTM-based speech enhancement in the airborne noise environment[J]. Chinese Journal of Electronics, 2023, 32 (4): 882- 895
doi: 10.23919/cje.2022.00.307
[94]   ZHANG R, E H, YUAN L, et al FGM-SPCL: open-set recognition network for medical images based on fine-grained data mixture and spatial position constraint loss[J]. Chinese Journal of Electronics, 2024, 33 (4): 1023- 1033
doi: 10.23919/cje.2023.00.081
[95]   ZOU B, YANG K, KUI X, et al Anomaly detection for streaming data based on grid-clustering and Gaussian distribution[J]. Information Sciences, 2023, 638: 118989
doi: 10.1016/j.ins.2023.118989
[96]   SUN T, JIANG B, LI B, et al. SimEnc: a high-performance similarity-preserving encryption approach for deduplication of encrypted Docker images [C]// 2024 USENIX Annual Technical Conference (USENIX ATC 24). Santa Clara: [s.n.], 2024: 615–630.
[97]   LIN Y, MAO Y, ZHANG Y, et al Secure deduplication schemes for content delivery in mobile edge computing[J]. Computers and Security, 2022, 114: 102602
[98]   XIAO W, HAO Y, LIANG J, et al Adaptive compression offloading and resource allocation for edge vision computing[J]. IEEE Transactions on Cognitive Communications and Networking, 2024, 10 (6): 2357- 2369
doi: 10.1109/TCCN.2024.3400820
[99]   CHEN L, GUO C, GONG B, et al A secure cross-domain authentication scheme based on threshold signature for MEC[J]. Journal of Cloud Computing, 2024, 13 (1): 70
doi: 10.1186/s13677-024-00631-x
[1] Xuming SONG,Xiaolong LI,Mian TANG,Tianliang WANG,Lijuan CHENG. SVM prediction method for displacement of high-speed railway piers caused by deep foundation pit excavation[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(6): 1233-1240.
[2] Shuang CHEN,Shihua LI,Jing SUN. Model construction and evaluation method of fuzzy reliability life of spherical hinge based on accuracy[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(3): 626-634.
[3] Zilong LI,Tianjian CHENG,Bo JIN,Wenming CHENG,Yilun CAO,Peng GUO. Study on bin relocation problem during outbound operation in intelligent forklift-based dense storage system[J]. Journal of ZheJiang University (Engineering Science), 2025, 59(10): 2023-2033.
[4] Shuaishuai LIU,Jing WANG,Zhe LIU,Zhonghuan XU. Construction of locally repairable codes based on orthogonal Latin square[J]. Journal of ZheJiang University (Engineering Science), 2024, 58(3): 501-509.
[5] Zhi MA,Yao-zhi LUO,Hui-bin GE,Hua-ping WAN,Wen-wei FU,Yan-bin SHEN. Failure probability estimation for structures based on health monitoring data and Bayesian network[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(8): 1551-1561.
[6] Qian WANG,Bin WANG,Xiang LIU. Optimal allocation of integrated energy systems in industrial parks under zero carbon trading[J]. Journal of ZheJiang University (Engineering Science), 2023, 57(11): 2294-2304.
[7] Xiao-hang LIU,Shan-suo ZHENG,Yu HUANG,Shu-qing DONG,Feng YANG,Jin-qi DONG. Seismic reliability analysis of substation system based on adjacency matrix[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(8): 1495-1503.
[8] Xin-ying ZHANG,Lu CHEN,Wen-hui YANG. A parallel-machine scheduling problem with time-changing effect and preventive maintenance[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(2): 408-418.
[9] Wei-hang CHEN,Qiang LUO,Teng-fei WANG,Wen-sheng ZHANG,Liang-wei JIANG. Reliability analysis of post-construction settlement of DMC composite foundation and design optimization[J]. Journal of ZheJiang University (Engineering Science), 2022, 56(10): 2019-2027.
[10] Xi-ran ZHANG,Shao-kuan CHEN,Bo WANG,Shuang LIU,Zhuo WANG. Emergency allocation optimization model considering reliability of replaceable rescue[J]. Journal of ZheJiang University (Engineering Science), 2021, 55(1): 20-30.
[11] Ge-hui LIU,Shao-kuan CHEN,Hua JIN,Shuang LIU,Hong-qin PENG. Optimum imperfect inspection and maintenance scheduling model considering delay time theory[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(7): 1298-1307.
[12] Li LONG,Shan-suo ZHENG,Yan ZHOU,Jin-chuan HE,Hong-li MENG,Yong-long CAI. Parallel study of seismic reliability analysis of water supply pipe network based on quasi-Monte Carlo method[J]. Journal of ZheJiang University (Engineering Science), 2020, 54(2): 241-247.
[13] Chuan ZHAO,Sui-huai YU,Lei WANG,Wen-hua LI. Body pressure distribution characteristics in different sampling densities[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(2): 268-274.
[14] Xing-bo HAN,Yong-xu XIA,Yong-dong WANG,Fei YE. Probabilistic degradation model for tunnel lining flexural capacity[J]. Journal of ZheJiang University (Engineering Science), 2019, 53(11): 2175-2184.
[15] QI Xiao-gang, WANG Zhen-yu, LIU Li-fang, LIU Xing-cheng, MA Jiu-long. Reliable and efficient routing of wireless sensors and actuator networks[J]. Journal of ZheJiang University (Engineering Science), 2018, 52(10): 1964-1972.