In a world driven by data, security and compliance are more important than ever — especially in enterprise-grade Hadoop environments. While HDFS provides basic POSIX-style permissions, modern organizations need fine-grained, centralized, and auditable access control.

Enter Apache Ranger, a security framework that enables policy-based authorization and auditing across Hadoop components. Integrating HDFS with Ranger allows administrators to define who can access what and how, without relying on complex OS-level permissions.

In this post, we’ll explore how to configure and integrate HDFS with Apache Ranger for robust access control and governance.


What is Apache Ranger?

Apache Ranger is a centralized platform to:

  • Define, manage, and enforce access control policies
  • Support role-based access across Hadoop services (HDFS, Hive, HBase, Kafka, etc.)
  • Audit user access through centralized logging
  • Provide LDAP/AD integration for authentication

Ranger uses plugins that are installed on each Hadoop component to intercept requests and enforce policies.


Why Use Ranger for HDFS Security?

Feature HDFS Native ACLs Apache Ranger
Centralized management
Role-based access
Auditing and logging Limited ✅ (to Solr/Log4j)
REST API/GUI for control
Integration with LDAP Manual Native

Ranger provides a GUI and REST API to define path-level policies, user/group-based access, and audit logging — essential for secure data lakes.


Prerequisites

  • Hadoop installed and configured (HDFS working)
  • Apache Ranger Admin and HDFS Plugin installed
  • Access to a web browser for Ranger Admin UI
  • (Optional) LDAP/AD for centralized user management

Step 1: Install the Ranger HDFS Plugin

The plugin enforces Ranger policies by intercepting HDFS access requests.

Steps:

  1. Download Ranger from the Apache Ranger site
  2. Extract and navigate to the HDFS plugin directory
  3. Run the setup script:
./enable-hdfs-plugin.sh

You’ll be prompted to provide:

  • Ranger Admin URL
  • Hadoop conf directory
  • Plugin repo name (e.g., hdfs_prod)
  • Authentication details (e.g., UNIX, LDAP)

The script installs the plugin, updates core-site.xml, and starts the agent.


Step 2: Configure HDFS for Ranger

Ensure the following properties are in core-site.xml:

<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>

<property>
  <name>hadoop.security.authentication</name>
  <value>kerberos</value> <!-- or simple -->
</property>

<property>
  <name>ranger.plugin.hdfs.policy.rest.url</name>
  <value>http://<ranger-admin-host>:6080</value>
</property>

<property>
  <name>ranger.plugin.hdfs.service.name</name>
  <value>hdfs_prod</value>
</property>

Restart NameNode and DataNode after these changes.


Step 3: Create HDFS Repository in Ranger Admin

In the Ranger Admin UI:

  1. Go to Service Manager > HDFS > Add New Service
  2. Fill in:
    • Service Name: hdfs_prod
    • Username: typically hdfs or service user
    • Policy Manager URL: e.g., http://ranger-admin:6080
  3. Test connection and save

Ranger will now manage access control for your HDFS cluster.


Step 4: Define Access Policies

Within the Ranger UI:

  • Navigate to your hdfs_prod service
  • Click Add New Policy
  • Define:
    • Path: /data/finance/
    • Users: analyst_user
    • Permissions: read, execute
    • Audit Logging: enabled by default

You can also assign groups, roles, or conditions (e.g., IP-based access).

Example: Allow only the finance_team group to access /data/finance.


Step 5: Enable Auditing

Ranger logs all access attempts — allowed or denied — via:

  • Solr (default)
  • Log4j
  • HDFS audit logs

Audit entries include:

  • Resource path
  • Access type
  • User identity
  • IP address
  • Result (allowed/denied)

You can monitor these logs via Ranger’s audit UI or integrate with SIEM tools for enterprise visibility.


Step 6: Validate Policy Enforcement

Test with a user:

sudo -u analyst_user hdfs dfs -ls /data/finance/

If access is denied, check:

  • Ranger policy for path and user
  • Plugin logs under /var/log/hadoop/hdfs
  • Audit logs in Ranger UI

Make changes in Ranger — no need to restart Hadoop.


Best Practices

  • Use Ranger groups and LDAP integration for scalable role management
  • Define deny policies before allow (explicit deny overrides)
  • Audit sensitive directories aggressively
  • Disable Hadoop native ACLs if Ranger is fully enabled
  • Regularly backup Ranger policies and configuration

Conclusion

Integrating HDFS with Apache Ranger offers powerful, centralized security for your Hadoop data lake. It replaces complex HDFS ACLs with intuitive, scalable, and auditable policies that are easy to manage and enforce.

With Ranger, organizations can confidently apply compliance rules, enforce least-privilege access, and gain full visibility into how data is accessed — all while maintaining Hadoop’s scalability and flexibility.