As enterprises deal with increasingly sensitive data, securing access to big data platforms like Hadoop becomes critical. Kerberos, a trusted network authentication protocol, is the foundation of Hadoop’s security model.

In this blog, we walk through the steps for setting up HDFS with Kerberos authentication, helping you protect your data and ensure only authorized users and services can access your Hadoop cluster.


Why Use Kerberos with HDFS?

By default, HDFS has no authentication — any user can access the file system. With Kerberos:

  • Clients and services must authenticate with a Key Distribution Center (KDC)
  • All interactions (NameNode, DataNode, clients) are mutually authenticated
  • Unauthorized access and impersonation are prevented

Kerberos becomes mandatory in secure Hadoop clusters for compliance and security best practices.


Key Concepts

  • KDC: Central authority issuing tickets and managing principals
  • Principal: Unique identity in the Kerberos realm (e.g., hdfs/nn.hadoop.local@EXAMPLE.COM)
  • Keytab: Encrypted file storing credentials for non-interactive login
  • Ticket Granting Ticket (TGT): Temporary credential for accessing services

Step 1: Set Up a Kerberos KDC

You can use MIT Kerberos or Active Directory as your KDC. For testing, install MIT Kerberos:

sudo apt install krb5-kdc krb5-admin-server krb5-config

Edit /etc/krb5.conf:

[libdefaults]
default_realm = EXAMPLE.COM

[realms]
EXAMPLE.COM = {
kdc = kdc.hadoop.local
admin_server = kdc.hadoop.local
}

[domain_realm]
.hadoop.local = EXAMPLE.COM
hadoop.local = EXAMPLE.COM

Create the Kerberos database:

sudo krb5_newrealm

Step 2: Create Kerberos Principals for Hadoop Services

Create principals for NameNode, DataNode, and HDFS users:

kadmin.local -q "addprinc -randkey hdfs/nn.hadoop.local@EXAMPLE.COM"
kadmin.local -q "addprinc -randkey hdfs/dn.hadoop.local@EXAMPLE.COM"
kadmin.local -q "addprinc hdfs-user@EXAMPLE.COM"

Generate keytabs:

kadmin.local -q "ktadd -k /etc/security/keytabs/nn.service.keytab hdfs/nn.hadoop.local"
kadmin.local -q "ktadd -k /etc/security/keytabs/dn.service.keytab hdfs/dn.hadoop.local"

Distribute keytabs securely to their respective nodes.


Step 3: Configure Hadoop for Kerberos

Update core-site.xml:

<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>

Update hdfs-site.xml:

<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@EXAMPLE.COM</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/_HOST@EXAMPLE.COM</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST@EXAMPLE.COM</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/security/keytabs/nn.service.keytab</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/etc/security/keytabs/dn.service.keytab</value>
</property>

Replace _HOST with the actual FQDN during runtime.


Step 4: Enable Kerberos on the Cluster

Restart HDFS with the updated configuration:

hdfs --daemon stop namenode
hdfs --daemon start namenode
hdfs --daemon stop datanode
hdfs --daemon start datanode

Authenticate manually:

kinit hdfs-user@EXAMPLE.COM

Then check HDFS access:

hdfs dfs -ls /

Step 5: Secure Web UIs and HTTP Services

Secure HDFS web UIs using SPNEGO:

<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/etc/security/keytabs/spnego.service.keytab</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST@EXAMPLE.COM</value>
</property>

Enable SSL and Kerberos for HDFS and YARN UIs for complete security coverage.


Best Practices for Kerberos and HDFS

  • Use strong password policies for Kerberos principals
  • Restrict and audit access to keytab files
  • Use automation tools like Ambari or Cloudera Manager for managing credentials
  • Monitor ticket expiration and automate kinit refresh using cron or scripts
  • Store keytabs securely with limited permissions (chmod 400)
  • Integrate Kerberos with Ranger for fine-grained access control

Troubleshooting Tips

  • Run klist to verify TGT is active
  • Use hdfs dfs -ls / to test authentication
  • Check logs in /var/log/hadoop-hdfs/ for ticket or keytab errors
  • Ensure time synchronization across nodes (Kerberos is sensitive to clock skew)
  • Test SPNEGO with curl --negotiate -u : http://nn.hadoop.local:50070

Conclusion

Securing HDFS with Kerberos is a fundamental step in building a production-grade Hadoop cluster. With strong authentication, ticket-based access control, and encryption support, Kerberos provides the foundation for secure big data infrastructure.

By following this guide and applying best practices, you can enforce robust access controls and meet regulatory requirements with confidence.