Tuesday, August 28, 2018

Exploring Apache Knox - part I

Apache Knox is an application gateway that works with the REST APIs and User Interfaces of a large number of the most popular big data projects. It can be convenient to enforce that REST or browser clients interact with Apache Knox rather than different components of an Apache Hadoop cluster for example. In particular, Apache Knox supports a wide range of different mechanisms for securing access to the backend cluster. In this series of posts, we will look at different ways of securing access to an Apache Hadoop filesystem via Apache Knox. In this first post we will look at accessing a file stored in HDFS via Apache Knox, where the Apache Knox gateway authenticates the user via Basic Authentication.

1) Set up Apache Hadoop

To start we assume that an Apache Hadoop cluster is already running, with a file stored in "/data/LICENSE.txt" that we want to access. To see how to set up Apache Hadoop in such a way, please refer to part 1 of this earlier post. Ensure that you can download the LICENSE.txt file in a browser directly from Apache Hadoop via:
  • http://localhost:9870/webhdfs/v1/data/LICENSE.txt?op=OPEN
Note that the default port for Apache Hadoop 2.x is "50070" instead.

2) Set up Apache Knox

Next we will see how to access the file above via Apache Knox. Download and extract Apache Knox (Gateway Server binary archive - version 1.1.0 was used in this tutorial). First we create a master secret via:
  • bin/knoxcli.sh create-master
Next we start a demo LDAP server that ships with Apache Knox for convenience:
  • bin/ldap.sh start
We can authenticate using the credentials "guest" and "guest-password" that are stored in the LDAP backend.

Apache Knox stores the "topologies" configuration in the directory "conf/topologies". We will re-use the default "sandbox.xml" configuration for the purposes of this post. This configuration maps to the URI "gateway/sandbox". It contains the authentication configuration for the topology (HTTP basic authentication), and which maps the received credentials to the LDAP backend we have started above. It then defines the backend services that are supported by this topology. We are interested in the "WEBHDFS" service which maps to "http://localhost:50070/webhdfs". Change this port to "9870" if using Apache Hadoop 3.0.0 as in the first section of this post. Then start the gateway via:
  • bin/gateway.sh start
Now we can access our file directly via Knox, using credentials of "guest" / "guest-password" via:
  • https://localhost:8443/gateway/sandbox/webhdfs/v1/data/LICENSE.txt?op=OPEN
Or alternatively using Curl:
  • curl -u guest:guest-password -kL https://localhost:8443/gateway/sandbox/webhdfs/v1/data/LICENSE.txt?op=OPEN

No comments:

Post a Comment