1) Installing Apache Hadoop
First, follow the steps outlined in the earlier tutorial (section 1) on setting up Apache Hadoop, except that in this tutorial we will work with Apache Hadoop 2.8.2. In addition, we will need to follow some additional steps to configure Yarn (see here for the official documentation). Create a new file called 'etc/hadoop/mapred-site.xml' with the content:
Next edit 'etc/hadoop/yarn-site.xml' and add:
Now we can start Apache Yarn via 'sbin/start-yarn.sh'. We are going to submit jobs as a local user called "alice" to test authorization. First we need to create some directories in HDFS:
- bin/hdfs dfs -mkdir -p /user/alice/input
- bin/hdfs dfs -put etc/hadoop/*.xml /user/alice/input
- bin/hadoop fs -chown -R alice /user/alice
- bin/hadoop fs -mkdir /tmp
- bin/hadoop fs -chmod og+w /tmp
- sudo -u alice bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar grep input output 'dfs[a-z.]+'
2) Install the Apache Ranger Yarn plugin
Next we will install the Apache Ranger Yarn plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
- mvn clean package assembly:assembly -DskipTests
- tar zxvf target/ranger-1.0.0-SNAPSHOT-yarn-plugin.tar.gz
- mv ranger-1.0.0-SNAPSHOT-yarn-plugin ${ranger.yarn.home}
- POLICY_MGR_URL: Set this to "http://localhost:6080"
- REPOSITORY_NAME: Set this to "YarnTest".
- COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hadoop installation
Finally, re-start Yarn and try to resubmit the job as "alice" as per the previous section. You should now see an authorization error: "User alice cannot submit applications to queue root.default".
3) Create authorization policies in the Apache Ranger Admin console
Next we will use the Apache Ranger admin console to create authorization policies for Yarn. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new Yarn service with the following configuration values:
- Service Name: YarnTest
- Username: admin
- Password: admin
- Yarn REST URL: http://localhost:8088
Allow up to 30 seconds for the Apache Ranger plugin to download the new authorization policy from the admin service. Then try to re-run the job as "alice". This time it should succeed due to the authorization policy that we have created.