Friday, May 5, 2017

Using SASL to secure the the data transfer protocol in Apache Hadoop

The previous blog article showed how to set up a pseudo-distributed Apache Hadoop cluster such that clients are authenticated using Kerberos. The DataNode that we configured authenticates itself by using privileged ports configured in the properties "dfs.datanode.address" and "dfs.datanode.http.address". This requires building and configuring JSVC as well as making sure that we can ssh to localhost without a password as root. An alternative solution (as noted in the article) is to use SASL to secure the data transfer protocol. Here we will briefly show how to do this, building on the configuration given in the previous post.

1) Configuring Hadoop to use SASL for the data transfer protocol

Follow section (2) of the previous post to configure Hadoop to authenticate users via Kerberos. We need to make the following changes to 'etc/hadoop/hdfs-site.xml':
  • dfs.datanode.address: Change the port number here to be a non-privileged port.
  • dfs.datanode.http.address: Change the port number here to be a non-privileged port.
We also need add the following properties to 'etc/hadoop/hdfs-site.xml':
  • dfs.data.transfer.protection: integrity.
  • dfs.http.policy: HTTPS_ONLY.
Edit 'etc/hadoop/hadoop-env.sh' and comment out the values we added for:
  • HADOOP_SECURE_DN_USER
  • JSVC_HOME
2) Configure SSL keys in ssl-server.xml

The next step is to configure some SSL keys in 'etc/hadoop/ssl-server.xml'. We'll use some sample keys that are used in Apache CXF to run the systests for the purposes of this dem. Download cxfca.jks and bob.jks into 'etc/hadoop'. Now edit 'etc/hadoop/ssl-server.xml' and define the following properties:
  • ssl.server.truststore.location: etc/hadoop/cxfca.jks
  • ssl.server.truststore.password: password
  • ssl.server.keystore.location: etc/hadoop/bob.jks
  • ssl.server.keystore.password: password
  • ssl.server.keystore.keypassword: password
3) Launch Kerby and HDFS and test authorization

Now that we have hopefully configured everything correctly it's time to launch the Kerby based KDC and HDFS. Start Kerby by running the JUnit test as described in the first section of the previous article. Now start HDFS via:
  • sbin/start-dfs.sh
Note that 'sudo sbin/start-secure-dns.sh' is not required as we are now using SASL for the data transfer protocol. Now we can read the file we added to "/data" in the previous article as "alice":
  • export KRB5_CONFIG=/pathtokerby/target/krb5.conf
  • kinit -k -t /pathtokerby/target/alice.keytab alice
  • bin/hadoop fs -cat /data/LICENSE.txt

2 comments:

  1. Hi,

    Thank you very much for your most informative blog post. However, I noticed some small glitches in trying to get things up and running. By correcting these I was able to get the setup running as explained in your tutorial:

    1. The parameter "ssl.server.truststore.location" should point to the filename "cxfca.jks" and NOT "cxf-ca.jks". This halts the datanode from starting.

    2. When running the kinit command the argument "-k" need to go before "-t". If not this produces just the help output for kinit:

    kinit -k -t /pathtokerby/target/alice.keytab alice


    Thank You Very Much

    ReplyDelete
    Replies
    1. Thanks for the feedback, I updated my post with the fixes you mentioned.

      Delete