
Will post the detailed design thoughts document soon. A Daemon thread inside Namenode should track such calls and process to DN as movement commands. Here the proposal is to provide an API from Namenode itself for trigger the storage policy satisfaction. So, Tracking all such business logic based file names could be difficult for admins from distributed nodes(ex: region servers) and running the Mover tool. The physical blocks still remain with source storage policy. This rename operation is just a metadata change in Namenode. So it will take effect from destination file/dir parent storage policy. In some distributed system scenarios (ex: HBase) it would be difficult to collect all the files and run the tool as different nodes can write files separately and file can have different paths.Īnother scenarios is, when user rename the files from one effected storage policy file (inherited policy from parent directory) to another storage policy effected directory, it will not copy inherited storage policy from source. User has to run the ‘Mover tool’ explicitly by specifying all such file names as a list. If user set the storage policy after writing and completing the file, then the blocks would have been written with default storage policy (nothing but DISK). When user set the storage policy before writing data, then the blocks could take advantage of storage policy preferences and stores physical block accordingly. These policies can be set on directory/file to specify the user preference, where to store the physical block. Heterogeneous storage in HDFS introduced the concept of storage policy.

See the "Storage Policy Satisfier (SPS)" section in the Archival Storage guide for detailed usage. If administrator is looking to run Mover tool explicitly, then he/she should make sure to disable SPS first and then run Mover. SPS should be started outside Namenode using "hdfs -daemon start sps". hbase Refactoring idea: Move classes into Client, Master, and RegionServer packages jira Issue Comment Edited: (HADOOP-1398) Add in-memory caching of data. The configs can be disabled dynamically without restarting Namenode. It can be enabled by setting ‘.mode’ to ‘external’ in hdfs-site.xml.

Since API calls goes to NN for tracking the invoked satisfier path(iNodes), administrator need to enable .mode’ config at NN to allow these operations. For the blocks which has storage policy mismatches, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. For Azure Storage, use AzCopy, and for Data Lake Storage use AdlCopy. Copy the hbase folder to a different Azure Storage blob container or Data Lake Storage location, and then start a new cluster with that data. The new instance is created with all the existing data. User can specify a file/directory path by invoking “hdfs storagepolicies -satisfyStoragePolicy -path ” command or via HdfsAdmin#satisfyStoragePolicy(path) API. Create a new HDInsight instance pointing to the current storage location.

StoragePolicySatisfier(SPS) allows users to track and satisfy the storage policy requirement of a given file/directory in HDFS.
