Wednesday, August 29, 2018

Exploring Apache Knox - part II

This is the second in a series of blog posts exploring some of the security features of Apache Knox. The first post looked at accessing a file stored in HDFS via Apache Knox, where the Apache Knox gateway authenticated the user via Basic Authentication. In this post we will look at authenticating to the REST API of Apache Knox using a token rather than using Basic Authentication. Apache Knox ships with a token service which allows an authenticated user to obtain a token, which can then be used to invoke on the REST API.

1) Set up the Apache Knox token service

To start with, follow the first tutorial to set up Apache Knox as well as the backend Apache Hadoop cluster we are trying to obtain a file from. Now we will create a new topology configuration file in Apache Knox to launch the token service. Copy "conf/topologies/sandbox.xml" to a new file called "conf/topologies/token.xml". Leave the 'gateway/provider' section as it is, as we want the user to authenticate to the token service using basic authentication as for the REST API in the previous post. Remove all of the 'service' definitions and add a service definition for the Knox token service, e.g.:
Restart Apache Knox. We can then obtain a token via the token service as follows using curl:
  • curl -u guest:guest-password -k https://localhost:8443/gateway/token/knoxtoken/api/v1/token
This returns a JSON structure containing an access token (in JWT format), as well as a "token_type" attribute of "Bearer" and an expiry timestamp. The access token itself can be introspected (via e.g. https://jwt.io/). In the example above, it contains a header "RS256" indicating it is a signed token (RSA + SHA-256), as well as payload attributes identifying the subject ("guest"), issuer ("KNOXSSO") and an expiry timestamp.

2) Invoking on the REST API using a token

The next step is to invoke on the REST API using a token, instead of using basic authentication as in the example given in the previous tutorial. Copy "conf/topologies/sandbox.xml" to "conf/topologies/sandbox-token.xml". Remove the Shiro provider and instead add the following provider:
Now restart the Apache Knox gateway again (edit: as Larry McCay points out in the comments this is not required, as long as we are not using Ambari to manage the topologies). First obtain a token using curl:
  • curl -u guest:guest-password -k https://localhost:8443/gateway/token/knoxtoken/api/v1/token
Copy the access token that is returned. Then you can invoke on the REST API using the token as follows:
  • curl -kL -H "Authorization: Bearer <access token>" https://localhost:8443/gateway/sandbox-token/webhdfs/v1/data/LICENSE.txt?op=OPEN

3 comments:

  1. Nice, Colm!
    FYI - restarting Knox isn't required in order to change or deploy new topologies. Unless you are making changes via Ambari.

    Since the topology names in here are not known by ambari, I assume that you aren't using it. Knox will pick up the changes automatically and redeploy.

    Thanks for the article!

    ReplyDelete
  2. I am trying to access Hive using Knox on SSO like below . but getting 401 error ... any help really appreciated

    jdbc:hive2://knoxhost:443/;ssl=true;transportMode=http;httpPath=gateway/tokenbased/hive;LogLevel=6;AuthMech=3;http.cookie.hadoop-jwt=eyJhbGciOiJSUzI1NiJ9.eyJzdWIiOiJOZWVyYWouVmVybWFAbm9yZHN0cm9tLmNvbSIsImlzcyI6IktOT1hTU08iLCJleHAiOjE1NzI5MDc1MTZ9.DaHlIvLp1dXDz38Q1eTTM0JLQ9IqtGo8T_sWrzWBt2nlae0-WJVQzHyZSUjkgvMuJAwWb7NtLiGDYedBhttMmmhoyY-JmM0Ta2LpBtqozKpolB6c3R7xfGnG8LhQPA8O3eUpq2-Sv0ltaNS63d8uygfbKHWjYhAREX2Sjf0-kK4

    ReplyDelete