1) Install the Apache Ranger HDFS plugin
First we will install the Apache Ranger HDFS plugin. Follow the steps in the previous tutorial to setup Apache Hadoop, if you have not done this already. Then download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
- mvn clean package assembly:assembly -DskipTests
- tar zxvf target/ranger-1.0.0-SNAPSHOT-hdfs-plugin.tar.gz
- mv ranger-1.0.0-SNAPSHOT-hdfs-plugin ${ranger.hdfs.home}
- POLICY_MGR_URL: Set this to "http://localhost:6080"
- REPOSITORY_NAME: Set this to "HDFSTest".
- COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hadoop installation
- sbin/start-dfs.sh
Next we will use the Apache Ranger admin console to create authorization policies for our data in HDFS. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new HDFS service with the following configuration values:
- Service Name: HDFSTest
- Username: admin
- Password: admin
- Namenode URL: hdfs://localhost:9000
3) Testing authorization in HDFS
Now let's test the Ranger authorization policy we created above in action. Note that by default the HDFS authorization plugin checks for a Ranger authorization policy that grants access first, and if this fails it falls back to the default POSIX permissions. The Ranger authorization plugin will pull policies from the Admin service every 30 seconds by default. For the "HDFSTest" example above, they are stored in "/etc/ranger/HDFSTest/policycache/" by default. Make sure that the user you are running Hadoop as can access this directory.
Now let's test to see if I can read the data file as follows:
- bin/hadoop fs -cat /data/LICENSE* (this should work via the underlying POSIX permissions)
- sudo -u alice bin/hadoop fs -cat /data/LICENSE* (this should work via the Ranger authorization policy)
- sudo -u bob bin/hadoop fs -cat /data/LICENSE* (this should fail as we don't have an authorization policy for "bob").
No comments:
Post a Comment