Tuesday, May 9, 2017

Securing Apache Hadoop Distributed File System (HDFS) - part VI

This is the sixth and final article in a series of posts on securing HDFS. In the second and third posts we looked at how to use Apache Ranger to authorize access to data stored in HDFS. In the fifth post, we looked at how to configure HDFS to authenticate users via Kerberos. In this post we will combine both scenarios, that is we will use Apache Ranger to authorize access to HDFS, which is secured using Kerberos.

1) Authenticating to Apache Ranger

Follow the fifth tutorial to set up HDFS using Kerberos for authentication. Then follow the second tutorial to install the Apache Ranger HDFS plugin. The Ranger HDFS plugin will not be able to download new policies from Apache Ranger, as we have not configured Ranger to be able to authenticate clients via Kerberos. Edit 'conf/ranger-admin-site.xml' in the Apache Ranger Admin service and edit the following properties:
  • ranger.spnego.kerberos.principal: HTTP/localhost@hadoop.apache.org
  • ranger.spnego.kerberos.keytab: Path to Kerby ranger.keytab
  • hadoop.security.authentication: kerberos
Now we need to configure Kerberos to use the krb5.conf file generated by Apache Kerby:
  • export JAVA_OPTS="-Djava.security.krb5.conf=<path to Kerby target/krb5.conf"
Start the Apache Ranger admin service ('sudo -E ranger-admin start' to pass the JAVA_OPTS variable through) and edit the "cl1_hadoop" service that was created in the second tutorial. Under "Add New Configurations" add the following:
  • policy.download.auth.users: hdfs
The Ranger HDFS policy should be able to download the policies now from the Ranger Admin service and apply authorization accordingly.

2) Authenticating to HDFS

As we have configured HDFS to require Kerberos, we won't be able to see the HDFS directories in the Ranger Admin service when creating policies any more, without making some changes to enable the Ranger Admin service to authenticate to HDFS. Edit 'conf/ranger-admin-site.xml' in the Apache Ranger Admin service and edit the following properties:
  • ranger.lookup.kerberos.principal: ranger/localhost@hadoop.apache.org
  • ranger.lookup.kerberos.keytab: Path to Kerby ranger.keytab
Edit the 'cl1_hadoop' policy that we created in the second tutorial and click on 'Test Connection'. This should fail as Ranger is not configured to authenticate to HDFS. Add the following properties:
  • Authentication Type: Kerberos
  • dfs.datanode.kerberos.principal: hdfs/localhost
  • dfs.namenode.kerberos.principal: hdfs/localhost
  • dfs.secondary.namenode.kerberos.principal: hdfs/localhost
Now 'Test Connection' should be successful.

4 comments:

  1. Hi,

    I have followed this tutorial in the following order:

    1. I have completed the Hadoop setup as described in Part 1
    2. Enabled Ranger plugin as explained in Part 2
    3. Skipped Part 3 & 4 & 5 and setup as required in the SASL tutorial
    4. Finally did the changes mentioned here.

    However, still "Test Connection" fails in my setup with the following error:

    Connection Failed.
    Unable to retrieve any files using given parameters, You can still save the repository and start creating policies, but you would not be able to use autocomplete for resource names. Check ranger_admin.log for more info.

    org.apache.ranger.plugin.client.HadoopException: Unable to login to Hadoop environment [HDFSTest].
    Unable to login to Hadoop environment [HDFSTest].
    Login failure for admin using password ************.
    Client not found in Kerberos database (6) - Client not found in Kerberos database.
    Identifier doesn't match expected value (906).

    Also, since I have skipped Part 3 my HDFS Service in ranger admin is "HDFSTest" and not "cl1_hadoop".

    Should I make that change too?

    Your help will be much appreciated.
    Thank You
    Shabir

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Was able to solve the issue.

      I was having two versions of the ranger-admin setup and was making the changes explained in this tutorial to the one which was not actually getting executed.

      It all works well now.

      Thanks!!

      Delete
  2. Hi,

    To ensure that the KMS service configured in Part IV [1] of this tutorial works after we have enabled Kerberos please follow the following steps:

    Change the following properties in "kms-site.xml" as follows:
    ("kms-site.xml" file can be found at {$ranger-kms-home}/ews/webapp/WEB-INF/classes/conf/kms-site.xml)

    hadoop.kms.authentication.type=kerberos
    hadoop.kms.authentication.kerberos.keytab={$PATH-TO_THE_KEYTABS_FROM_PART_V[2]}/target/keyadmin.keytab
    hadoop.kms.authentication.kerberos.principal={set it to "*" or "HTTP/localhost"}

    change property "hadoop.kms.proxyuser.ranger.groups" to "hadoop.kms.proxyuser.keyadmin.groups"
    change property "hadoop.kms.proxyuser.ranger.hosts" to "hadoop.kms.proxyuser.keyadmin.hosts"
    change property "hadoop.kms.proxyuser.ranger.users" to "hadoop.kms.proxyuser.keyadmin.users"

    For all three properties set value - "*" (without the quotes). If the properties are not there then add them.

    Stop both ranger-admin and ranger-kms.
    Start ranger-admin and then ranger-kms.

    Now you should be able to get "Connection Successful" for the kmsdev service when you log into the ranger admin UI using keyadmin/keyadmin. You have to change username/password of the service to be - keyadmin@hadoop.apache.org/keyadmin.

    You should also be able to retrieve the keys created for this service under keymanager.

    [1] https://coheigea.blogspot.ca/2017/04/securing-apache-hadoop-distributed-file_26.html
    [2] https://coheigea.blogspot.ca/2017/05/securing-apache-hadoop-distributed-file.html


    Thanks
    Shabir

    ReplyDelete