HDFS Lease Management

Last modified : 1 March, 2017

Although HDFS draws inspiration from GFS (sheesh…. alright its almost a blatant copy) there is one important difference between them. In both filesystems, writers must obtain an exclusive lock for a file before they’d be allowed to write / append / truncate data in those files. Notably, this exclusive lock does NOT prevent other clients from reading the file, (so a client could be writing a file, and at the same time another could be reading the same file). The tracking of these locks is vastly different between GFS and HDFS and has interesting implications.

Leases

In HDFS these locks are called Leases. Leases are granted to a client which request to open a file for a write operation (e.g. create / append / truncate a file.) Every lease belongs to a single HDFS Client but could be for several HDFS files. Often enough a lease has several thousand files open for write by a single HDFS client. As the client opens and closes files, the appropriate lease must be identified and updated. The exact datastructures have been changed quite frequently over the years to provide better lookups, better reverse lookups, speed and space efficiency etc. However all this accounting obviously is done on the NameNode. This is in stark contrast to GFS, where a lease is tracked by the Namenode (master server in their parlance) and Datanodes (chunk servers in their parlance) (Section 3.1 in the Google File System paper). For HDFS this means the Namenode has a higher overhead of now maintaining these leases (something GFS expressly wanted to avoid). However this also allows HDFS to allow renames of files being written (which in my experience is not too uncommon an operation.)

All content on this website is licensed as Creative Commons-Attribution-ShareAlike 4.0 License. Opinions expressed are solely my own.