Facilitating external sorting on SMR-based large-scale storage systems
To boost the efficiency and alleviate the write amplification issue of SMR-based external merge sort, the SMR-based External Merge Sort (SMR-EMS) strategy is proposed in this study for performing external merge sort via revisiting the storage-centric computing concept while considering the characteristics of both external merge sort and SMR drives.

System architecture of the proposed SMR-EMS strategy.
Technology Overview
Shingled magnetic recording is proposed to increase the density of disk drives. One major performance issue of SMR is the sequential-write constraint. External merge sort is one of important data processing technique. Sequential-write constraint greatly affects the performance of external merge sort. The proposed SMR-EMS strategy reduce the latency of external merge sort by 90.8%
Applications & Benefits
The proposed SMR-EMS strategy introduces the active-sort caching design to sort the incoming write traffic within the SMR drives actively, the sorted runs mapping scheme to facilitate the data retrieval process and the locality-aware space allocator to lower the seek latency during both read and write operations. With the designed components, the proposed SMR-EMS strategy can effectively reduce the external sorting latency by an average of 90.80%.
Abstract:
In the big data era, retaining the capability to process and store the sheer amount of data has become a necessity for data-intensive computing. To meet the requirement of big data processing, the storage-centric computing concept of processing data within storage devices has gained its popularity over the years, because the latency and energy consumed by moving data between host systems and storage devices gradually exceed that of processing data. To process data for data-intensive computing, one of the fundamental data processing technique is external sorting, which is widely used in database management systems (DBMS) and Hadoop framework. On the other hand, to store the ever-increasing volumes of data, shingled magnetic recording (SMR) drives have been proposed to increase the areal density of conventional hard disk drives (HDDs) via overlapping adjacent tracks. The SMR drive is widely regarded as a promising technology for the big data application because SMR drives can boost the capacity of HDDs without significant technology changes. Nevertheless, the overlapped track layout of SMR drive imposes the sequential write constraint on incoming write traffic, thus worsening the efficiency of performing external sorting on SMR drives. Such an observation motivates us to propose an SMR-based External Merge Sort (SMR-EMS) strategy for SMR-based large-scale storage systems with the goals of alleviating the negative impacts of sequential write constraint and enhancing the performance of external sorting on SMR drives via utilizing the concept of storage-centric computing. Experiments were conducted to demonstrate the capability of the proposed strategy on improving the efficiency of external merge sorting on SMR drives.

Facilitating external sorting on SMR-based large-scale storage systems
Author:Chih-Hsuan Chen, Shuo-Han Chen, Yu-Pei Liang, Tseng-Yi Chen, Tsan-sheng Hsu, Hsin-Wen Wei, Wei-Kuan Shih
Year:2021
Source publication:Future Generation Computer Systems Volume 116, March 2021, Pages 333-348
Subfield Highest percentage:99%    Hardware and Architecture   #2/167
https://www.sciencedirect.com/science/article/pii/S0167739X20330168

