This is the fourth in a series of blog posts on securing HDFS. The
first post described how to install Apache
Hadoop,
and how to use POSIX permissions and ACLs to restrict access to data
stored in HDFS. The
second post looked at how to use Apache
Ranger
to authorize access to data stored in HDFS. The
third post looked
at how Apache Ranger can create "tag" based authorization policies for
HDFS using Apache
Atlas. In this post I will look at how you can implement
transparent encryption in HDFS using the Apache Ranger Key Management Service (KMS).
1) Install and Configure the Apache Ranger KMS
If you have not done so already, then follow the instructions in
this tutorial to install the Apache Ranger admin service, and then start it via "sudo ranger-admin start". Open a browser and go to "http://localhost:6080/". Log on with "admin/admin" and click on "Settings". Create a new user corresponding to the name of the user which starts HDFS.
The next step is to install the Apache Ranger KMS. Please follow step (2) in a
blog post I wrote last year about this. When installation is complete, then start the KMS service with "sudo ranger-kms start". Log out of the Admin UI and then log back in again with the credentials "keyadmin/keyadmin". Click on the "+" button on the "KMS" tab to create a new KMS Service. Specify the following values:
- Service Name: kmsdev
- KMS URL: kms://http@localhost:9292/kms
- Username: keyadmin
- Password: keyadmin
When the "kmsdev" service has been created then click on it and edit the default policy that has been created. Edit the existing "allow condition" for "hdfs" adding in the user that will be starting HDFS (if not the "hdfs" user itself). Also grant the "CREATE" permission to that user so that we can create keys from the command line, and the "DECRYPT EEK" permission, so that the user can decrypt the data encryption key:
2) Create an encryption zone in HDFS
In your Hadoop distribution (after first following the steps in the
first post), edit 'etc/hadoop/core-site.xml' and add the following property:
- hadoop.security.key.provider.path - kms://http@localhost:9292/kms
Similarly, edit 'etc/hadoop/hdfs-site.xml' and add the following property:
- dfs.encryption.key.provider.uri - kms://http@localhost:9292/kms
Start HDFS via 'sbin/start-dfs.sh'. Let's create a new encryption key called "enckey" as follows:
- bin/hadoop key create enckey
If you go back to the Ranger Admin UI and click on "Encryption / Key Manager" and select the "kmsdev" service, you should be able to see the new key that was created. Now let's create a new encryption zone in HDFS as follows:
- bin/hadoop fs -mkdir /zone
- bin/hdfs crypto -createZone -keyName enckey -path /zone
- bin/hdfs crypto -listZones
That's it! We can put data into the '/zone' directory and it will be encrypted by a key which in turn is encrypted by the key we have created and stored in the Ranger KMS.
Hi,
ReplyDeleteCan you help me/guide to a doc which explains how to get the Ranger KMS to work when Kerberos is enabled for HADOOP??
In the the tutorial set about securing Hadoop when I reach the final part where we setup authentication to Ranger via kerberos, the KMS Ranger service does not work.
Meaning the connection fails for the "kmsdev" service we created and keys are not fetched.
Can you help me with this?
Thanks
SHabir
How are you securing the KMS service? Are you adding properties as per: https://community.hortonworks.com/questions/36929/ranger-kms-kerberos-issue.html
ReplyDeleteThis comment has been removed by the author.
DeleteHi,
DeleteThank you very much for your reply.
I edited the sample KDC Code in [1] to have an additional principal called "keyadmin". And then I log into the ranger-kms as "keyadmin"/"keyadmin" and edited the kmsdev service we have created in this tutorial to use the following username and password:
uname - keyadmin@hadoop.apache.org
password - keyadmin
And "Test Connection" fails. The error I get is as listed in [2].
Where I should I set the properties listed in the above link (the one given by you)?
Inside "hdfs-site.xml"?
I am trying to get something running. You're help would be much useful indeed.
THank You
[1] https://github.com/coheigea/testcases/blob/master/apache/bigdata/kerberos/src/test/java/org/apache/coheigea/bigdata/kerberos/hadoop/HadoopKerbyTest.java
[2] https://community.hortonworks.com/questions/25385/test-connection-for-ranger-kms-repository-fails.html
This comment has been removed by the author.
DeleteI have made the following changes in the listed files and tried as well (with no avail):
ReplyDeleteAdded these two parameters in the "install.proerties" and re-setup ranger-kms:
REPOSITORY_CONFIG_USERNAME=keyadmin@hadoop.apache.org
REPOSITORY_CONFIG_PASSWORD=keyadmin
And then:
---------------------
{$ranger-kms-home}/ews/webapp/WEB-INF/classes/conf/kms-site.xml:
hadoop.kms.authentication.type=kerberos
hadoop.kms.authentication.kerberos.keytab={$PATH}/testcases/apache/bigdata/kerberos/target/keyadmin.keytab
hadoop.kms.authentication.kerberos.principal=*
keyadmin.keytab was created just like how keytabs for "alice" and "bob" are created.
Also the "http" principal was added to this keytab.
---------------------
{$ranger-kms-home}/ews/webapp/WEB-INF/classes/conf/core-site.xml:
hadoop.kms.proxyuser.keyadmin.users=*
hadoop.kms.proxyuser.keyadmin.groups*
---------------------
The error I am getting is:
org.apache.ranger.plugin.client.HadoopException: {
"RemoteException" : {
"message" : "Unauthorized connection for super-user: keyadmin@hadoop.apache.org from IP 127.0.0.1",.....}
This post seems to talk about it: https://community.hortonworks.com/questions/9384/unauthorized-connection-for-super-user.html
But I dont understand which file to give permission to.
Can you help me please? :)
Thanks
ReplyDeleteHi,
IT WORKS :)... IT WORKS....
To ensure that the KMS service configures in Part IV [1] of this tutorial set please follow the following steps:
Change the following properties in "kms-site.xml" as follows:
("kms-site.xml" file can be found at {$ranger-kms-home}/ews/webapp/WEB-INF/classes/conf/kms-site.xml)
hadoop.kms.authentication.type=kerberos
hadoop.kms.authentication.kerberos.keytab={$PATH-TO_THE_KEYTABS_FROM_PART_V[2]}/target/keyadmin.keytab
hadoop.kms.authentication.kerberos.principal={set it to "*" or "HTTP/localhost"}
change property "hadoop.kms.proxyuser.ranger.groups" to "hadoop.kms.proxyuser.keyadmin.groups"
change property "hadoop.kms.proxyuser.ranger.hosts" to "hadoop.kms.proxyuser.keyadmin.hosts"
change property "hadoop.kms.proxyuser.ranger.users" to "hadoop.kms.proxyuser.keyadmin.users"
For all three properties set value - "*" (without the quotes). If the properties are not there then add them.
Stop both ranger-admin and ranger-kms.
Start ranger-admin and then ranger-kms.
Now you should be able to get "Connection Successful" for the kmsdev service when you log into the ranger admin UI using keyadmin/keyadmin. You have to change username/password of the service to be - keyadmin@hadoop.apache.org/keyadmin.
You should also be able to retrieve the keys created for this service under keymanager.
[1] https://coheigea.blogspot.ca/2017/04/securing-apache-hadoop-distributed-file_26.html
[2] https://coheigea.blogspot.ca/2017/05/securing-apache-hadoop-distributed-file.html
Thanks
Shabir
This comment above is about how to get the KMS setup done in this tutorial to be working even after configuring Kerberos authentication for Apache Ranger as explained in [1].
DeleteThe first line of the above comment has some typos which makes it unclear as to what the rest explains.
[1] https://coheigea.blogspot.ca/2017/05/securing-apache-hadoop-distributed-file_9.html?showComment=1496932883888#c8210261410123242180
Great, thanks for the update!
ReplyDelete