Last modified: 1 July, 2017
Hadoop uses something called Delegation Tokens for most authentication purposes. This may well be a concept borrowed from other systems but I first encountered it in Hadoop. Although Hadoop does use Kerberos, it only does so for the initial authentication. I’ve heard from multiple sources that this was done because Kerberos wasn’t expected to handle several thousand clients trying to authenticate (possibly for every application.) Clients and services usually are set up with keytab files with a principal in the format service/hostname@KERBEROS_REALM . When talking to a Hadoop service, you will be required to acquire a valid Kerberos ticket. A client presents this ticket to the Hadoop server which verifies the ticket, and ensures you are who you say you are. The Hadoop service then returns a delegation token to you. In all subsequent communication, the DT is used to prove to the Hadoop server who you are. Incidentally one of the issues in running long running services on secure Hadoop clusters is the renewal and expiration of these very DTs.
Each Hadoop service returns a different “kind” of token. Each Hadoop server implements the SecretManager which generates and tracks these tokens. So the Namenode has its own SecretManager and so does the ResourceManager. I’m guessing the reason why all these services don’t share one token is because in the early days of Hadoop, it wasn’t known which services will be used. (e.g. use HDFS without MR. Or YARN without HDFS). I also know people who were crucial to Hadoop contemplate using X509 certificates for authentication. Alas, we’ll never know if that would have been better.
Please add comments here:
All content on this website is licensed as Creative Commons-Attribution-ShareAlike 4.0 License. Opinions expressed are solely my own.