Tuesday, May 9, 2017

Securing Apache Hadoop Distributed File System (HDFS) - part VI

This is the sixth and final article in a series of posts on securing HDFS. In the second and third posts we looked at how to use Apache Ranger to authorize access to data stored in HDFS. In the fifth post, we looked at how to configure HDFS to authenticate users via Kerberos. In this post we will combine both scenarios, that is we will use Apache Ranger to authorize access to HDFS, which is secured using Kerberos.

1) Authenticating to Apache Ranger

Follow the fifth tutorial to set up HDFS using Kerberos for authentication. Then follow the second tutorial to install the Apache Ranger HDFS plugin. The Ranger HDFS plugin will not be able to download new policies from Apache Ranger, as we have not configured Ranger to be able to authenticate clients via Kerberos. Edit 'conf/ranger-admin-site.xml' in the Apache Ranger Admin service and edit the following properties:
  • ranger.spnego.kerberos.principal: HTTP/localhost@hadoop.apache.org
  • ranger.spnego.kerberos.keytab: Path to Kerby ranger.keytab
  • hadoop.security.authentication: kerberos
Now we need to configure Kerberos to use the krb5.conf file generated by Apache Kerby:
  • export JAVA_OPTS="-Djava.security.krb5.conf=<path to Kerby target/krb5.conf"
Start the Apache Ranger admin service ('sudo -E ranger-admin start' to pass the JAVA_OPTS variable through) and edit the "cl1_hadoop" service that was created in the second tutorial. Under "Add New Configurations" add the following:
  • policy.download.auth.users: hdfs
The Ranger HDFS policy should be able to download the policies now from the Ranger Admin service and apply authorization accordingly.

2) Authenticating to HDFS

As we have configured HDFS to require Kerberos, we won't be able to see the HDFS directories in the Ranger Admin service when creating policies any more, without making some changes to enable the Ranger Admin service to authenticate to HDFS. Edit 'conf/ranger-admin-site.xml' in the Apache Ranger Admin service and edit the following properties:
  • ranger.lookup.kerberos.principal: ranger/localhost@hadoop.apache.org
  • ranger.lookup.kerberos.keytab: Path to Kerby ranger.keytab
Edit the 'cl1_hadoop' policy that we created in the second tutorial and click on 'Test Connection'. This should fail as Ranger is not configured to authenticate to HDFS. Add the following properties:
  • Authentication Type: Kerberos
  • dfs.datanode.kerberos.principal: hdfs/localhost
  • dfs.namenode.kerberos.principal: hdfs/localhost
  • dfs.secondary.namenode.kerberos.principal: hdfs/localhost
Now 'Test Connection' should be successful.

No comments:

Post a Comment