Wednesday, April 26, 2017

Securing Apache Hadoop Distributed File System (HDFS) - part IV

This is the fourth in a series of blog posts on securing HDFS. The first post described how to install Apache Hadoop, and how to use POSIX permissions and ACLs to restrict access to data stored in HDFS. The second post looked at how to use Apache Ranger to authorize access to data stored in HDFS. The third post looked at how Apache Ranger can create "tag" based authorization policies for HDFS using Apache Atlas. In this post I will look at how you can implement transparent encryption in HDFS using the Apache Ranger Key Management Service (KMS).

1) Install and Configure the Apache Ranger KMS

If you have not done so already, then follow the instructions in this tutorial to install the Apache Ranger admin service, and then start it via "sudo ranger-admin start". Open a browser and go to "http://localhost:6080/". Log on with "admin/admin" and click on "Settings". Create a new user corresponding to the name of the user which starts HDFS.

The next step is to install the Apache Ranger KMS. Please follow step (2) in a blog post I wrote last year about this. When installation is complete, then start the KMS service with "sudo ranger-kms start". Log out of the Admin UI and then log back in again with the credentials "keyadmin/keyadmin". Click on the "+" button on the "KMS" tab to create a new KMS Service. Specify the following values:
  • Service Name: kmsdev
  • KMS URL: kms://http@localhost:9292/kms
  • Username: keyadmin
  • Password: keyadmin
When the "kmsdev" service has been created then click on it and edit the default policy that has been created. Edit the existing "allow condition" for "hdfs" adding in the user that will be starting HDFS (if not the "hdfs" user itself). Also grant the "CREATE" permission to that user so that we can create keys from the command line, and the "DECRYPT EEK" permission, so that the user can decrypt the data encryption key:


2) Create an encryption zone in HDFS

In your Hadoop distribution (after first following the steps in the first post), edit 'etc/hadoop/core-site.xml' and add the following property:
  • hadoop.security.key.provider.path - kms://http@localhost:9292/kms
Similarly, edit 'etc/hadoop/hdfs-site.xml' and add the following property:
  • dfs.encryption.key.provider.uri - kms://http@localhost:9292/kms
Start HDFS via 'sbin/start-dfs.sh'. Let's create a new encryption key called "enckey" as follows:
  • bin/hadoop key create enckey
If you go back to the Ranger Admin UI and click on "Encryption / Key Manager" and select the "kmsdev" service, you should be able to see the new key that was created. Now let's create a new encryption zone in HDFS as follows:
  • bin/hadoop fs -mkdir /zone
  • bin/hdfs crypto -createZone -keyName enckey -path /zone
  • bin/hdfs crypto -listZones
That's it! We can put data into the '/zone' directory and it will be encrypted by a key which in turn is encrypted by the key we have created and stored in the Ranger KMS.

9 comments:

  1. Hi,

    Can you help me/guide to a doc which explains how to get the Ranger KMS to work when Kerberos is enabled for HADOOP??

    In the the tutorial set about securing Hadoop when I reach the final part where we setup authentication to Ranger via kerberos, the KMS Ranger service does not work.

    Meaning the connection fails for the "kmsdev" service we created and keys are not fetched.

    Can you help me with this?

    Thanks
    SHabir

    ReplyDelete
  2. How are you securing the KMS service? Are you adding properties as per: https://community.hortonworks.com/questions/36929/ranger-kms-kerberos-issue.html

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. Hi,

      Thank you very much for your reply.

      I edited the sample KDC Code in [1] to have an additional principal called "keyadmin". And then I log into the ranger-kms as "keyadmin"/"keyadmin" and edited the kmsdev service we have created in this tutorial to use the following username and password:

      uname - keyadmin@hadoop.apache.org
      password - keyadmin

      And "Test Connection" fails. The error I get is as listed in [2].

      Where I should I set the properties listed in the above link (the one given by you)?

      Inside "hdfs-site.xml"?

      I am trying to get something running. You're help would be much useful indeed.

      THank You

      [1] https://github.com/coheigea/testcases/blob/master/apache/bigdata/kerberos/src/test/java/org/apache/coheigea/bigdata/kerberos/hadoop/HadoopKerbyTest.java

      [2] https://community.hortonworks.com/questions/25385/test-connection-for-ranger-kms-repository-fails.html

      Delete
    3. This comment has been removed by the author.

      Delete
  3. I have made the following changes in the listed files and tried as well (with no avail):

    Added these two parameters in the "install.proerties" and re-setup ranger-kms:

    REPOSITORY_CONFIG_USERNAME=keyadmin@hadoop.apache.org
    REPOSITORY_CONFIG_PASSWORD=keyadmin

    And then:
    ---------------------
    {$ranger-kms-home}/ews/webapp/WEB-INF/classes/conf/kms-site.xml:

    hadoop.kms.authentication.type=kerberos
    hadoop.kms.authentication.kerberos.keytab={$PATH}/testcases/apache/bigdata/kerberos/target/keyadmin.keytab
    hadoop.kms.authentication.kerberos.principal=*

    keyadmin.keytab was created just like how keytabs for "alice" and "bob" are created.
    Also the "http" principal was added to this keytab.
    ---------------------

    {$ranger-kms-home}/ews/webapp/WEB-INF/classes/conf/core-site.xml:

    hadoop.kms.proxyuser.keyadmin.users=*
    hadoop.kms.proxyuser.keyadmin.groups*
    ---------------------

    The error I am getting is:

    org.apache.ranger.plugin.client.HadoopException: {
    "RemoteException" : {
    "message" : "Unauthorized connection for super-user: keyadmin@hadoop.apache.org from IP 127.0.0.1",.....}

    This post seems to talk about it: https://community.hortonworks.com/questions/9384/unauthorized-connection-for-super-user.html
    But I dont understand which file to give permission to.
    Can you help me please? :)

    Thanks

    ReplyDelete

  4. Hi,

    IT WORKS :)... IT WORKS....

    To ensure that the KMS service configures in Part IV [1] of this tutorial set please follow the following steps:

    Change the following properties in "kms-site.xml" as follows:
    ("kms-site.xml" file can be found at {$ranger-kms-home}/ews/webapp/WEB-INF/classes/conf/kms-site.xml)

    hadoop.kms.authentication.type=kerberos
    hadoop.kms.authentication.kerberos.keytab={$PATH-TO_THE_KEYTABS_FROM_PART_V[2]}/target/keyadmin.keytab
    hadoop.kms.authentication.kerberos.principal={set it to "*" or "HTTP/localhost"}

    change property "hadoop.kms.proxyuser.ranger.groups" to "hadoop.kms.proxyuser.keyadmin.groups"
    change property "hadoop.kms.proxyuser.ranger.hosts" to "hadoop.kms.proxyuser.keyadmin.hosts"
    change property "hadoop.kms.proxyuser.ranger.users" to "hadoop.kms.proxyuser.keyadmin.users"

    For all three properties set value - "*" (without the quotes). If the properties are not there then add them.

    Stop both ranger-admin and ranger-kms.
    Start ranger-admin and then ranger-kms.

    Now you should be able to get "Connection Successful" for the kmsdev service when you log into the ranger admin UI using keyadmin/keyadmin. You have to change username/password of the service to be - keyadmin@hadoop.apache.org/keyadmin.

    You should also be able to retrieve the keys created for this service under keymanager.

    [1] https://coheigea.blogspot.ca/2017/04/securing-apache-hadoop-distributed-file_26.html
    [2] https://coheigea.blogspot.ca/2017/05/securing-apache-hadoop-distributed-file.html


    Thanks
    Shabir

    ReplyDelete
    Replies
    1. This comment above is about how to get the KMS setup done in this tutorial to be working even after configuring Kerberos authentication for Apache Ranger as explained in [1].

      The first line of the above comment has some typos which makes it unclear as to what the rest explains.

      [1] https://coheigea.blogspot.ca/2017/05/securing-apache-hadoop-distributed-file_9.html?showComment=1496932883888#c8210261410123242180

      Delete