Wednesday, December 13, 2017

A fast way to get membership counts in Apache Syncope

Apache Syncope is a powerful open source Identity Management project, covered extensively on this blog. Amongst many other features, it allows the management of three core types - Users, Groups and "Any Objects", the latter which can be used to model arbitrary types. These core types can be accessed via a flexible REST API powered by Apache CXF. In this post we will explore the concept of "membership" in Apache Syncope, as well as a new feature that was added for Syncope 2.0.7 which allows an easy way to see membership counts.

1) Membership in Apache Syncope

Users and "Any Objects" can be members of Groups in two ways - statically and dynamically. "Static" membership is when the User or "Any Object" is explicitly assigned membership of a given Group. "Dynamic" membership is when the Group is defined with a set of rules, which if they evaluate to true for a given User or "Any Object", then that User or "Any Object" is a member of the group. For example, a User could be a dynamic member of a group based on the value for a given User attribute. So we could have an Apache group with a dynamic User membership rule of "*@apache.org" matching an "email" attribute.

2) Exploring group membership via the REST API

Let's examine group membership with some practical examples. Start Apache Syncope and log in to the admin console. Click on "Groups" and add a new group called "employee", accepting the default options. Now click on the "User" tab and add new Users called "alice" and "bob", with static membership of the "employee" group.

Using a tool like "curl", we can access the REST API using the admin credentials to obtain information on "alice":
  • curl -u admin:password http://localhost:9080/syncope/rest/users/alice
Note that "alice" has a "memberships" attribute pointing to the "employee" group. Next we can see information on the "employee" group via:
  • curl -u admin:password http://localhost:9080/syncope/rest/groups/employee
3) Obtaining membership counts

Now consider obtaining the membership count of a given group. Let's say we are interested in finding out how many employees we have - how can this be done? Prior to Apache Syncope 2.0.7, we have to leverage the power of FIQL which underpins the search capabilities of the REST API of Apache Syncope:
  • curl -u admin:password http://localhost:9080/syncope/rest/users?fiql=%24groups==employee
In other words, search for all Users who are members of the "employee" group. This returns a long list of all Users, even though all we care about is the count (which is encoded in the "totalCount" attribute). There is a new way to do this Apache Syncope 2.0.7. Instead of having to search for Users, membership counts are now encoded in groups. So we can see the total membership counts for a given group just by doing a GET call:
  • curl -u admin:password http://localhost:9080/syncope/rest/groups/employee
Following the example above, you should see an "staticUserMembershipCount" attribute with a value of "2". Four new attributes are defined for GroupTO:
  • staticUserMembershipCount: The static user membership count of a given group
  • dynamicUserMembershipCount: The dynamic user membership count of a given group
  • staticAnyObjectMembershipCount: The static "Any Object" membership count of a given group
  • dynamicAnyObjectMembershipCount: The dynamic "Any Object" membership count of a given group.
Some consideration was given to returning the Any Object counts associated with a given Any Object type, but this was abandoned due to performance reasons.

Friday, December 8, 2017

SAML SSO support for the Apache Syncope web console

Apache Syncope is a powerful open source Identity Management project, that has recently celebrated 5 years as an Apache top level project. Up to recently, a username and password must be supplied to log onto either the admin or enduser web consoles of Apache Syncope. However SAML SSO login is now supported since the 2.0.3 release. Instead of supplying a username/password, the user is redirected to a third party IdP for login, before redirecting back to the Apache Syncope web console. In 2.0.5, support for the IdP-initiated flow of SAML SSO was added.

In this post we will show how to configure Apache Syncope to use SAML SSO as an alternative to logging in using a username and password. We will use Apache CXF Fediz as the SAML SSO IdP. In addition, we will show how to achieve IdP-initiated SSO using Okta. Please also refer to this tutorial on achieving SAML SSO with Syncope and Shibboleth.

1) Logging in to Apache Syncope using SAML SSO

In this section, we will cover setting up Apache Syncope to re-direct to a third party IdP so that the user can enter their credentials. The next section will cover the IdP-initiated case.

1.a) Enable SAML SSO support in Apache Syncope

First we will configure Apache Syncope to enable SAML SSO support. Download and extract the most recent standalone distribution release of Apache Syncope (2.0.6 was used in this post). Start the embedded Apache Tomcat instance and then open a web browser and navigate to "http://localhost:9080/syncope-console", logging in as "admin" and "password".

Apache Syncope is configured with some sample data to show how it can be used. Click on "Users" and add a new user called "alice" by clicking on the subsequent "+" button. Specify a password for "alice" and then select the default values wherever possible (you will need to specify some required attributes, such as "surname"). Now in the left-hand column, click on "Extensions" and then "SAML 2.0 SP". Click on the "Service Provider" tab and then "Metadata". Save the resulting Metadata document, as it will be required to set up the SAML SSO IdP.

1.b) Set up the Apache CXF Fediz SAML SSO IdP

Next we will turn our attention to setting up the Apache CXF Fediz SAML SSO IdP. Download the most recent source release of Apache CXF Fediz (1.4.3 was used for this tutorial). Unzip the release and build it using maven ("mvn clean install -DskipTests"). In the meantime, download and extract the latest Apache Tomcat 8.5.x distribution (tested with 8.5.24). Once Fediz has finished building, copy all of the "IdP" wars (e.g. in fediz-1.4.3/apache-fediz/target/apache-fediz-1.4.3/apache-fediz-1.4.3/idp/war/fediz-*) to the Tomcat "webapps" directory.

There are a few configuration changes to be made to Apache Tomcat before starting it. Download the HSQLDB jar and copy it to the Tomcat "lib" directory. Next edit 'conf/server.xml' and configure TLS on port 8443:

The two keys referenced here can be obtained from 'apache-fediz/target/apache-fediz-1.4.3/apache-fediz-1.4.3/examples/samplekeys/' and should be copied to the root directory of Apache Tomcat. Tomcat can now be started.

Next we have to configure Apache CXF Fediz to support Apache Syncope as a "service" via SAML SSO. Edit 'webapps/fediz-idp/WEB-INF/classes/entities-realma.xml' and add the following configuration:

In addition, we need to make some changes to the "idp-realmA" bean in this file:
  • Add a reference to this bean in the "applications" list: <ref bean="srv-syncope" />
  • Change the "idpUrl" property to: https://localhost:8443/fediz-idp/saml
  • Change the port for "stsUrl" from "9443" to "8443".
Now we need to configure Fediz to accept Syncope's signing cert. Edit the Metadata file you saved from Syncope in step 1.a. Copy the Base-64 encoded certificate in the "KeyDescriptor" section, and paste it (including line breaks) into 'webapps/fediz-idp/WEB-INF/classes/syncope.cert', enclosing it in between "-----BEGIN CERTIFICATE-----" and "-----END CERTIFICATE-----".

Now restart Apache Tomcat. Open a browser and save the Fediz metadata which is available at "http://localhost:8080/fediz-idp/metadata?protocol=saml", which we will require when configuring Apache Syncope.

1.c) Configure the Apache CXF Fediz IdP in Syncope

The final configuration step takes place in Apache Syncope again. In the "SAML 2.0 SP" configuration screen, click on the "Identity Providers" tab and click the "+" button and select the Fediz metadata that you saved in the previous step. Now logout and an additional login option can be seen:


Select the URL for the SAML SSO IdP and you will be redirected to Fediz. Select the IdP in realm "A" as the home realm and enter credentials of "alice/ecila" when prompted. You will be successfully authenticated to Fediz and redirected back to the Syncope admin console, where you will be logged in as the user "alice". 

2) Using IdP-initiated SAML SSO

Instead of the user starting with the Syncope web console, being redirected to the IdP for authentication, and then redirected back to Syncope - it is possible instead to start from the IdP. In this section we will show how to configure Apache Syncope to support IdP-initiated SAML SSO using Okta.

2.a) Configuring a SAML application in Okta

The first step is to create an account at Okta and configure a SAML application. This process is mapped out at the following link. Follow the steps listed on this page with the following additional changes:
  • Specify the following for the Single Sign On URL: http://localhost:9080/syncope-console/saml2sp/assertion-consumer
  • Specify the following for the audience URL: http://localhost:9080/syncope-console/
  • Specify the following for the default RelayState: idpInitiated
When the application is configured, you will see an option to "View Setup Instructions". Open this link in a new tab and find the section about the IdP Metadata. Save this to a local file and set it aside for the moment. Next you need to assign the application to the username that you have created at Okta.

2.b) Configure Apache Syncope to support IdP-Initiated SAML SSO

Log on to the Apache Syncope admin console using the admin credentials, and add a new IdP Provider in the SAML 2.0 SP extension as before, using the Okta metadata file that you have saved in the previous section. Edit the metadata and select the 'Support Unsolicited Logins' checkbox. Save the metadata and make sure that the Okta user is also a valid user in Apache Syncope.

Now go back to the Okta console and click on the application you have configured for Apache Syncope. You should seemlessly be logged into the Apache Syncope admin console.




Friday, December 1, 2017

Kerberos cross-realm support in Apache Kerby 1.1.0

A recent blog post covered how to install the Apache Kerby KDC. In this post we will build on that tutorial to show how to get a major new feature of Apache Kerby 1.1.0 to work - namely kerberos cross-realm support. Cross-realm support means that the KDCs in realm "A" and realm "B" are configured in such a way that a user who is authenticated in realm "A" can obtain a service ticket for a service in realm "B" without having to explicitly authenticate to the KDC in realm "B".

1) Configure the KDC for the "EXAMPLE.COM" realm

First we will configure the Apache Kerby KDC for the "EXAMPLE.COM" realm. Follow the previous tutorial to install and configure the KDC for this (default) realm. We need to follow some additional steps to get cross-realm support working with a second KDC in realm "EXAMPLE2.COM". Edit 'conf/krb5.conf' and replace the "realms" section with the following configuration:
Next we need to add a special principal to the KDC to enable cross-realm support via (after restarting the KDC):
  • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
  • addprinc -pw security krbtgt/EXAMPLE2.COM@EXAMPLE.COM
2) Configure the KDC for the "EXAMPLE2.COM" realm

Now we will configure a second KDC for the "EXAMPLE2.COM" realm. Download the Apache Kerby source code as before. Unzip the source and build the distribution via:
  • mvn clean install -DskipTests
  • cd kerby-dist
  • mvn package -Pdist
Copy "kdc-dist" to a location where you wish to install the second KDC. In this directory, create a directory called "keytabs" and "runtime". Edit 'conf/backend.conf' and change the value for 'backend.json.dir' to avoid conflict with the first KDC instance. Then create some keytabs via:
  • sh bin/kdcinit.sh conf keytabs
For testing purposes, we will change the port of the KDC from the default "88" to "54321" to avoid having to run the KDC with administrator privileges. Edit "conf/krb5.conf" and "conf/kdc.conf" and change "88" to "54321". In addition, change the realm from "EXAMPLE.COM" to "EXAMPLE2.COM" in both of these files. As above, edit 'conf/krb5.conf' and replace the "realms" section with the following configuration:
Next start the KDC via:
  • sh bin/start-kdc.sh conf runtime
We need to add a special principal to the KDC to enable cross-realm support, as in the KDC for the "EXAMPLE.COM" realm. Note that it must be the same principal name and password as for the first realm. We will also add a principal for a service in this realm:
  • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
  • addprinc -pw security krbtgt/EXAMPLE2.COM@EXAMPLE.COM
  • addprinc -pw password service@EXAMPLE2.COM
3) Obtaining a service ticket for service@EXAMPLE2.COM as alice@EXAMPLE.COM

Now we can obtain a service ticket for the service we have configured in the "EXAMPLE2.COM" realm as a user who is authenticated to the "EXAMPLE.COM" realm. Configure the "tool-dist" distribution as per the previous tutorial, updating 'conf/krb5.conf' with the same "realms", "domain_realm" and "capaths" information as shown above. Now we can authenticate as "alice" and obtain a service ticket as follows:
  • sh bin/kinit.sh -conf conf alice@EXAMPLE.COM
  • sh bin/kinit.sh -conf conf -c /tmp/krb5cc_1000 -S service@EXAMPLE2.COM
If you run "klist" then you should see that a ticket for "service@EXAMPLE2.COM" was obtained successfully.

Wednesday, November 29, 2017

Authorizing access to Apache Yarn using Apache Ranger

Earlier this year, I wrote a series of blog posts on how to secure access to the Apache Hadoop filesystem (HDFS), using tools like Apache Ranger and Apache Atlas. In this post, we will go further and show how to authorize access to Apache Yarn using Apache Ranger. Apache Ranger allows us to create and enforce authorization policies based on who is allowed to submit applications to run on Apache Yarn. Therefore it can be used to enforce authorization decisions for Hive on Yarn or Spark on Yarn jobs.

1) Installing Apache Hadoop

First, follow the steps outlined in the earlier tutorial (section 1) on setting up Apache Hadoop, except that in this tutorial we will work with Apache Hadoop 2.8.2. In addition, we will need to follow some additional steps to configure Yarn (see here for the official documentation). Create a new file called 'etc/hadoop/mapred-site.xml' with the content:
Next edit 'etc/hadoop/yarn-site.xml' and add:
Now we can start Apache Yarn via 'sbin/start-yarn.sh'. We are going to submit jobs as a local user called "alice" to test authorization. First we need to create some directories in HDFS:
  • bin/hdfs dfs -mkdir -p /user/alice/input
  • bin/hdfs dfs -put etc/hadoop/*.xml /user/alice/input
  • bin/hadoop fs -chown -R alice /user/alice
  • bin/hadoop fs -mkdir /tmp
  • bin/hadoop fs -chmod og+w /tmp
Now we can submit an example job as "alice" via:
  • sudo -u alice bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar grep input output 'dfs[a-z.]+'
The job should run successfully and store the output in '/user/alice/output'. Delete this directory before trying to run the job again ('bin/hadoop fs -rm -r /user/alice/output').

2) Install the Apache Ranger Yarn plugin

Next we will install the Apache Ranger Yarn plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-1.0.0-SNAPSHOT-yarn-plugin.tar.gz
  • mv ranger-1.0.0-SNAPSHOT-yarn-plugin ${ranger.yarn.home}
Now go to ${ranger.yarn.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "YarnTest".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hadoop installation
Save "install.properties" and install the plugin as root via "sudo -E ./enable-yarn-plugin.sh". Make sure that the user who is running Yarn has the permission to read the policies stored in '/etc/ranger/YarnTest'. There is one additional step to be performed in Hadoop before restarting Yarn. Edit 'etc/hadoop/ranger-yarn-security.xml' and add a property called "ranger.add-yarn-authorization" with value "false". This means that if Ranger policy authorization fails, it doesn't fall back to the default Yarn ACLs (which allow all users to submit jobs to the default queue).

Finally, re-start Yarn and try to resubmit the job as "alice" as per the previous section. You should now see an authorization error: "User alice cannot submit applications to queue root.default".

3) Create authorization policies in the Apache Ranger Admin console

Next we will use the Apache Ranger admin console to create authorization policies for Yarn. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new Yarn service with the following configuration values:
  • Service Name: YarnTest
  • Username: admin
  • Password: admin
  • Yarn REST URL: http://localhost:8088
Click on "Test Connection" to verify that we can connect successfully to Yarn + then save the new service. Now click on the "YarnTest" service that we have created. Add a new policy for the "root.default" queue for the user "alice" (create this user if you have not done so already under "Settings, Users/Groups"), with a permission of "submit-app".

Allow up to 30 seconds for the Apache Ranger plugin to download the new authorization policy from the admin service. Then try to re-run the job as "alice". This time it should succeed due to the authorization policy that we have created.

Tuesday, November 28, 2017

Installing the Apache Kerby KDC

Apache Kerby is a subproject of the Apache Directory project, and is a complete open-source KDC written entirely in Java. Apache Kerby 1.1.0 has just been released. This release contains two major new features: a GSSAPI module (covered previously here) and cross-realm support (the subject of a forthcoming blog post).

I have previously used Apache Kerby in this blog as a KDC to illustrate some security-based test-cases for big data components such as Apache Hadoop, Hive, Storm, etc, by pointing to some code on github that shows how to launch a Kerby KDC using Apache maven. This is convenient as a KDC can be launched with the principals already created via a single maven command. However, it is not suitable if the KDC is to be used in a standalone setting.

In this post, we will show how to create a Kerby KDC distribution without writing any code.

1) Install and configure the Apache Kerby KDC

The first step is to download the Apache Kerby source code. Unzip the source and build the distribution via:
  • mvn clean install -DskipTests
  • cd kerby-dist
  • mvn package -Pdist
The "kerby-dist" directory contains the KDC distribution in "kdc-dist", as well as the client tools in "tool-dist". Copy both "kdc-dist" and "tool-dist" directories to another location instead of working directly in the Kerby source. In "kdc-dist" create a directory called "keytabs" and "runtime". Then create some keytabs via:
  • sh bin/kdcinit.sh conf keytabs
This will create keytabs for the "kadmin" and "protocol" principals, and store them in the "keytabs" directory. For testing purposes, we will change the port of the KDC from the default "88" to "12345" to avoid having to run the KDC with administrator privileges. Edit "conf/krb5.conf" and "conf/kdc.conf" and change "88" to "12345".

The Kerby principals are stored in a backend that is configured in "conf/backend.conf". By default this is a JSON file that is stored in "/tmp/kerby/jsonbackend". However, Kerby also supports other more robust backends, such as LDAP, Mavibot, Zookeeper, etc.

We can start the KDC via:
  • sh bin/start-kdc.sh conf runtime
Let's create a new user called "alice":
  • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
  • addprinc -pw password alice@EXAMPLE.COM
2) Install and configure the Apache Kerby tool dist

We can check that the KDC has started properly using the MIT kinit tool, if it is installed locally:
  • export KRB5_CONFIG=/path.to.kdc.dist/conf/krb5.conf
  • kinit alice (use "password" for the password when prompted)
Now you can see the ticket for alice using "klist". Apache Kerby also ships a "tool-dist" distribution that contains implementations of "kinit", "klist", etc. First call "kdestroy" to remove the ticket previously obtained for "alice". Then go into the directory where "tool-dist" was installed to in the previous section. Edit "conf/krb5.conf" and replace "88" with "12345". We can now obtain a ticket for "alice" via:
  • sh bin/kinit.sh -conf conf alice
  • sh bin/klist.sh


Thursday, September 21, 2017

Configuring Kerberos for Hive in Talend Open Studio for Big Data

Earlier this year, I showed how to use Talend Open Studio for Big Data to access data stored in HDFS, where HDFS had been configured to authenticate users using Kerberos. A similar blog post showed how to read data from an Apache Kafka topic using kerberos. In this tutorial I will show how to create a job in Talend Open Studio for Big Data to read data from an Apache Hive table using kerberos. As a prerequisite, please follow a recent tutorial on setting up Apache Hadoop and Apache Hive using kerberos. 

1) Download Talend Open Studio for Big Data and create a job

Download Talend Open Studio for Big Data (6.4.1 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "HiveKerberosRead". In the search bar under "Palette" on the right hand side enter "hive" and hit enter. Drag "tHiveConnection" and "tHiveInput" to the middle of the screen. Do the same for "tLogRow":

"tHiveConnection" will be used to configure the connection to Hive. "tHiveInput" will be used to perform a query on the "words" table we have created in Hive (as per the earlier tutorial linked above), and finally "tLogRow" will just log the data so that we can be sure that it was read correctly. The next step is to join the components up. Right click on "tHiveConnection" and select "Trigger/On Subjob Ok" and drag the resulting line to "tHiveInput". Right click on "tHiveInput" and select "Row/Main" and drag the resulting line to "tLogRow":



3) Configure the components

Now let's configure the individual components. Double click on "tHiveConnection". Select the following configuration options:
  • Distribution: Hortonworks
  • Version: HDP V2.5.0
  • Host: localhost
  • Database: default
  • Select "Use Kerberos Authentication"
  • Hive Principal: hiveserver2/localhost@hadoop.apache.org
  • Namenode Principal: hdfs/localhost@hadoop.apache.org
  • Resource Manager Principal: mapred/localhost@hadoop.apache.org
  • Select "Use a keytab to authenticate"
  • Principal: alice
  • Keytab: Path to "alice.keytab" in the Kerby test project.
  • Unselect "Set Resource Manager"
  • Set Namenode URI: "hdfs://localhost:9000"

Now click on "tHiveInput" and select the following configuration options:
  • Select "Use an existing Connection"
  • Choose the tHiveConnection name from the resulting "Component List".
  • Click on "Edit schema". Create a new column called "word" of type String, and a column called "count" of type int. 
  • Table name: words
  • Query: "select * from words where word == 'Dare'"

Now the only thing that remains is to point to the krb5.conf file that is generated by the Kerby project. Click on "Window/Preferences" at the top of the screen. Select "Talend" and "Run/Debug". Add a new JVM argument: "-Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf":
Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. You should see the following output in the Run Window in the Studio:

Wednesday, September 20, 2017

Securing Apache Hive - part VI

This the sixth and final blog post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. The fifth post looked at an alternative authorization solution called Apache Sentry.

In this post we will switch our attention from authorization to authentication, and show how we can authenticate Apache Hive users via kerberos.

1) Set up a KDC using Apache Kerby

A github project that uses Apache Kerby to start up a KDC is available here:
  • bigdata-kerberos-deployment: This project contains some tests which can be used to test kerberos with various big data deployments, such as Apache Hadoop etc.
The KDC is a simple junit test that is available here. To run it just comment out the "org.junit.Ignore" annotation on the test method. It uses Apache Kerby to define the following principals for both Apache Hadoop and Apache Hive:
  • hdfs/localhost@hadoop.apache.org
  • HTTP/localhost@hadoop.apache.org
  • mapred/localhost@hadoop.apache.org
  • hiveserver2/localhost@hadoop.apache.org
  • alice@hadoop.apache.org 
Keytabs are created in the "target" folder. Kerby is configured to use a random port to lauch the KDC each time, and it will create a "krb5.conf" file containing the random port number in the target directory.

2) Configure Apache Hadoop to use Kerberos

The next step is to configure Apache Hadoop to use Kerberos. As a pre-requisite, follow the first tutorial on Apache Hive so that the Hadoop data and Hive table are set up before we apply Kerberos to the mix. Next, follow the steps in section (2) of an earlier tutorial on configuring Hadoop with Kerberos that I wrote. Some additional steps are also required when configuring Hadoop for use with Hive.

Edit 'etc/hadoop/core-site.xml' and add:
  • hadoop.proxyuser.hiveserver2.groups: *
  • hadoop.proxyuser.hiveserver2.hosts: localhost
The previous tutorial on securing HDFS with kerberos did not specify any kerberos configuration for Map-Reduce, as it was not required. For Apache Hive we need to configure Map Reduce appropriately. We will simplify things by using a single principal for the Job Tracker, Task Tracker and Job History. Create a new file 'etc/hadoop/mapred-site.xml' with the following properties:
  • mapreduce.framework.name: classic
  • mapreduce.jobtracker.kerberos.principal: mapred/localhost@hadoop.apache.org
  • mapreduce.jobtracker.keytab.file: Path to Kerby mapred.keytab (see above).
  • mapreduce.tasktracker.keytab.file: mapred/localhost@hadoop.apache.org
  • mapreduce.tasktracker.keytab.file: Path to Kerby mapred.keytab (see above).
  • mapreduce.jobhistory.kerberos.principal:  mapred/localhost@hadoop.apache.org
  • mapreduce.jobhistory.keytab.file: Path to Kerby mapred.keytab (see above).
Start Kerby by running the JUnit test as described in the first section. Now start HDFS via:
  • sbin/start-dfs.sh
  • sudo sbin/start-secure-dns.sh
3) Configure Apache Hive to use Kerberos

Next we will configure Apache Hive to use Kerberos. Edit 'conf/hiveserver2-site.xml' and add the following properties:
  • hive.server2.authentication: kerberos
  • hive.server2.authentication.kerberos.principal: hiveserver2/localhost@hadoop.apache.org
  • hive.server2.authentication.kerberos.keytab: Path to Kerby hiveserver2.keytab (see above).
Start Hive via 'bin/hiveserver2'. In a separate window, log on to beeline via the following steps:
  • export KRB5_CONFIG=/pathtokerby/target/krb5.conf
  • kinit -k -t /pathtokerby/target/alice.keytab alice
  • bin/beeline -u "jdbc:hive2://localhost:10000/default;principal=hiveserver2/localhost@hadoop.apache.org"
At this point authentication is successful and we should be able to query the "words" table as per the first tutorial.