Setting Up HDFS with Kerberos for Enhanced Security
Secure your Hadoop cluster by integrating HDFS with Kerberos for strong authentication
As enterprises deal with increasingly sensitive data, securing access to big data platforms like Hadoop becomes critical. Kerberos, a trusted network authentication protocol, is the foundation of Hadoop’s security model.
In this blog, we walk through the steps for setting up HDFS with Kerberos authentication, helping you protect your data and ensure only authorized users and services can access your Hadoop cluster.
Why Use Kerberos with HDFS?
By default, HDFS has no authentication — any user can access the file system. With Kerberos:
- Clients and services must authenticate with a Key Distribution Center (KDC)
- All interactions (NameNode, DataNode, clients) are mutually authenticated
- Unauthorized access and impersonation are prevented
Kerberos becomes mandatory in secure Hadoop clusters for compliance and security best practices.
Key Concepts
- KDC: Central authority issuing tickets and managing principals
- Principal: Unique identity in the Kerberos realm (e.g.,
hdfs/nn.hadoop.local@EXAMPLE.COM
) - Keytab: Encrypted file storing credentials for non-interactive login
- Ticket Granting Ticket (TGT): Temporary credential for accessing services
Step 1: Set Up a Kerberos KDC
You can use MIT Kerberos or Active Directory as your KDC. For testing, install MIT Kerberos:
sudo apt install krb5-kdc krb5-admin-server krb5-config
Edit /etc/krb5.conf
:
[libdefaults]
default_realm = EXAMPLE.COM
[realms]
EXAMPLE.COM = {
kdc = kdc.hadoop.local
admin_server = kdc.hadoop.local
}
[domain_realm]
.hadoop.local = EXAMPLE.COM
hadoop.local = EXAMPLE.COM
Create the Kerberos database:
sudo krb5_newrealm
Step 2: Create Kerberos Principals for Hadoop Services
Create principals for NameNode, DataNode, and HDFS users:
kadmin.local -q "addprinc -randkey hdfs/nn.hadoop.local@EXAMPLE.COM"
kadmin.local -q "addprinc -randkey hdfs/dn.hadoop.local@EXAMPLE.COM"
kadmin.local -q "addprinc hdfs-user@EXAMPLE.COM"
Generate keytabs:
kadmin.local -q "ktadd -k /etc/security/keytabs/nn.service.keytab hdfs/nn.hadoop.local"
kadmin.local -q "ktadd -k /etc/security/keytabs/dn.service.keytab hdfs/dn.hadoop.local"
Distribute keytabs securely to their respective nodes.
Step 3: Configure Hadoop for Kerberos
Update core-site.xml
:
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
Update hdfs-site.xml
:
<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/_HOST@EXAMPLE.COM</value>
</property>
<property>
<name>dfs.datanode.kerberos.principal</name>
<value>hdfs/_HOST@EXAMPLE.COM</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST@EXAMPLE.COM</value>
</property>
<property>
<name>dfs.namenode.keytab.file</name>
<value>/etc/security/keytabs/nn.service.keytab</value>
</property>
<property>
<name>dfs.datanode.keytab.file</name>
<value>/etc/security/keytabs/dn.service.keytab</value>
</property>
Replace _HOST
with the actual FQDN during runtime.
Step 4: Enable Kerberos on the Cluster
Restart HDFS with the updated configuration:
hdfs --daemon stop namenode
hdfs --daemon start namenode
hdfs --daemon stop datanode
hdfs --daemon start datanode
Authenticate manually:
kinit hdfs-user@EXAMPLE.COM
Then check HDFS access:
hdfs dfs -ls /
Step 5: Secure Web UIs and HTTP Services
Secure HDFS web UIs using SPNEGO:
<property>
<name>dfs.web.authentication.kerberos.keytab</name>
<value>/etc/security/keytabs/spnego.service.keytab</value>
</property>
<property>
<name>dfs.web.authentication.kerberos.principal</name>
<value>HTTP/_HOST@EXAMPLE.COM</value>
</property>
Enable SSL and Kerberos for HDFS and YARN UIs for complete security coverage.
Best Practices for Kerberos and HDFS
- Use strong password policies for Kerberos principals
- Restrict and audit access to keytab files
- Use automation tools like Ambari or Cloudera Manager for managing credentials
- Monitor ticket expiration and automate
kinit
refresh using cron or scripts - Store keytabs securely with limited permissions (
chmod 400
) - Integrate Kerberos with Ranger for fine-grained access control
Troubleshooting Tips
- Run
klist
to verify TGT is active - Use
hdfs dfs -ls /
to test authentication - Check logs in
/var/log/hadoop-hdfs/
for ticket or keytab errors - Ensure time synchronization across nodes (Kerberos is sensitive to clock skew)
- Test SPNEGO with
curl --negotiate -u : http://nn.hadoop.local:50070
Conclusion
Securing HDFS with Kerberos is a fundamental step in building a production-grade Hadoop cluster. With strong authentication, ticket-based access control, and encryption support, Kerberos provides the foundation for secure big data infrastructure.
By following this guide and applying best practices, you can enforce robust access controls and meet regulatory requirements with confidence.