1) Installing Apache Hadoop
The first step is to download and extract Apache Hadoop. This tutorial uses version 2.7.3. The next step is to configure Apache Hadoop as a single node cluster so that we can easily get it up and running on a local machine. You will need to follow the steps outlined in the previous link to install ssh + pdsh. If you can't log in to localhost without a password ("ssh localhost") then you need to follow the instructions given in the link about setting up passphraseless ssh.
In addition, we want to run Apache Hadoop in pseudo-distributed mode, where each Hadoop daemon runs as a separate Java process. Edit 'etc/hadoop/core-site.xml' and add:
Next edit 'etc/hadoop/hdfs-site.xml' and add:
Make sure that the JAVA_HOME variable in 'etc/hadoop/hadoop-env.sh' is correct, and then format the filesystem and start Hadoop via:
- bin/hdfs namenode -format
- sbin/start-dfs.sh
- bin/hadoop fs -mkdir /data
- bin/hadoop fs -put LICENSE.txt /data
- bin/hadoop fs -ls /data
- bin/hadoop fs -cat /data/*
We've seen how to access some data stored in HDFS via the command line. Now how can we create some authorization policies to restrict how to access this data? Well the simplest way is to use the standard POSIX Permissions. If we look at the /data directory we see that it has the following permissions "-rw-r--r--", which means other users can read the LICENSE file stored there. Remove access to other users apart from the owner via:
- bin/hadoop fs -chmod og-r /data/*
- sudo -u alice bin/hadoop fs -cat /data/*
3) Securing HDFS using ACLs
Securing access to data stored in HDFS via POSIX permissions works fine, however it does not allow you for example to specify fine-grained permissions for users other than the file owner. What if we want to allow "alice" from the previous section to read the file but not "bob"? We can achieve this via Hadoop ACLs. To enable ACLs, we will need to add a property called "dfs.namenode.acls.enabled" with value "true" to 'etc/hadoop/hdfs-site.xml' + re-start HDFS.
We can grant read access to 'alice' via:
- bin/hadoop fs -setfacl -m user:alice:r-- /data/*
- bin/hadoop fs -setfacl -m user:alice:r-x /data
- bin/hadoop fs -getfacl /data/LICENSE.txt
- bin/hadoop fs -setfacl -b /data
- bin/hadoop fs -setfacl -b /data/LICENSE.txt
No comments:
Post a Comment