1) Set up a KDC using Apache Kerby
If we are going to configure Apache Hadoop to use Kerberos to authenticate users, then we need a Kerberos Key Distribution Center (KDC). Typically most documentation revolves around installing the MIT Kerberos server, adding principals, and creating keytabs etc. However, in this post we will show a simpler way of getting started by using a pre-configured maven project that uses Apache Kerby. Apache Kerby is a subproject of the Apache Directory project, and is a complete open-source KDC written entirely in Java.
A github project that uses Apache Kerby to start up a KDC is available here:
- bigdata-kerberos-deployment: This project contains some tests which can be used to test kerberos with various big data deployments, such as Apache Hadoop etc.
- alice@hadoop.apache.org
- bob@hadoop.apache.org
- hdfs/localhost@hadoop.apache.org
- HTTP/localhost@hadoop.apache.org
2) Configure Hadoop to authenticate users via Kerberos
Download and configure Apache Hadoop as per the first tutorial. For now, we will not enable the Ranger authorization plugin, but rather secure access to the "/data" directory using ACLs, as described in section (3) of the first tutorial, such that "alice" has permission to read the file stored in "/data" but "bob" does not. The next step is to configure Hadoop to authenticate users via Kerberos.
Edit 'etc/hadoop/core-site.xml' and adding the following property name/values:
- hadoop.security.authentication: kerberos
- dfs.namenode.keytab.file: Path to Kerby hdfs.keytab (see above).
- dfs.namenode.kerberos.principal: hdfs/localhost@hadoop.apache.org
- dfs.namenode.kerberos.internal.spnego.principal: HTTP/localhost@hadoop.apache.org
- dfs.datanode.data.dir.perm: 700
- dfs.datanode.address: 0.0.0.0:1004
- dfs.datanode.http.address: 0.0.0.0:1006
- dfs.web.authentication.kerberos.principal: HTTP/localhost@hadoop.apache.org
- dfs.datanode.keytab.file: Path to Kerby hdfs.keytab (see above).
- dfs.datanode.kerberos.principal: hdfs/localhost@hadoop.apache.org
- dfs.block.access.token.enable: true
- export HADOOP_SECURE_DN_USER=(the user you are running HDFS as)
- export JSVC_HOME=(path to JSVC as above)
- export HADOOP_OPTS="-Djava.security.krb5.conf=<path to Kerby target/krb5.conf"
3) Launch Kerby and HDFS and test authorization
Now that we have hopefully configured everything correctly it's time to launch the Kerby based KDC and HDFS. Start Kerby by running the JUnit test as described in the first section. Now start HDFS via:
- sbin/start-dfs.sh
- sudo sbin/start-secure-dns.sh
- export KRB5_CONFIG=/pathtokerby/target/krb5.conf
- kinit -k -t /pathtokerby/target/alice.keytab alice
- bin/hadoop fs -cat /data/LICENSE.txt
- kdestroy
- kinit -k -t /pathtokerby/target/bob.keytab bob
- bin/hadoop fs -cat /data/LICENSE.txt
No comments:
Post a Comment