Wednesday, December 13, 2017

A fast way to get membership counts in Apache Syncope

Apache Syncope is a powerful open source Identity Management project, covered extensively on this blog. Amongst many other features, it allows the management of three core types - Users, Groups and "Any Objects", the latter which can be used to model arbitrary types. These core types can be accessed via a flexible REST API powered by Apache CXF. In this post we will explore the concept of "membership" in Apache Syncope, as well as a new feature that was added for Syncope 2.0.7 which allows an easy way to see membership counts.

1) Membership in Apache Syncope

Users and "Any Objects" can be members of Groups in two ways - statically and dynamically. "Static" membership is when the User or "Any Object" is explicitly assigned membership of a given Group. "Dynamic" membership is when the Group is defined with a set of rules, which if they evaluate to true for a given User or "Any Object", then that User or "Any Object" is a member of the group. For example, a User could be a dynamic member of a group based on the value for a given User attribute. So we could have an Apache group with a dynamic User membership rule of "*@apache.org" matching an "email" attribute.

2) Exploring group membership via the REST API

Let's examine group membership with some practical examples. Start Apache Syncope and log in to the admin console. Click on "Groups" and add a new group called "employee", accepting the default options. Now click on the "User" tab and add new Users called "alice" and "bob", with static membership of the "employee" group.

Using a tool like "curl", we can access the REST API using the admin credentials to obtain information on "alice":
  • curl -u admin:password http://localhost:9080/syncope/rest/users/alice
Note that "alice" has a "memberships" attribute pointing to the "employee" group. Next we can see information on the "employee" group via:
  • curl -u admin:password http://localhost:9080/syncope/rest/groups/employee
3) Obtaining membership counts

Now consider obtaining the membership count of a given group. Let's say we are interested in finding out how many employees we have - how can this be done? Prior to Apache Syncope 2.0.7, we have to leverage the power of FIQL which underpins the search capabilities of the REST API of Apache Syncope:
  • curl -u admin:password http://localhost:9080/syncope/rest/users?fiql=%24groups==employee
In other words, search for all Users who are members of the "employee" group. This returns a long list of all Users, even though all we care about is the count (which is encoded in the "totalCount" attribute). There is a new way to do this Apache Syncope 2.0.7. Instead of having to search for Users, membership counts are now encoded in groups. So we can see the total membership counts for a given group just by doing a GET call:
  • curl -u admin:password http://localhost:9080/syncope/rest/groups/employee
Following the example above, you should see an "staticUserMembershipCount" attribute with a value of "2". Four new attributes are defined for GroupTO:
  • staticUserMembershipCount: The static user membership count of a given group
  • dynamicUserMembershipCount: The dynamic user membership count of a given group
  • staticAnyObjectMembershipCount: The static "Any Object" membership count of a given group
  • dynamicAnyObjectMembershipCount: The dynamic "Any Object" membership count of a given group.
Some consideration was given to returning the Any Object counts associated with a given Any Object type, but this was abandoned due to performance reasons.

Friday, December 8, 2017

SAML SSO support for the Apache Syncope web console

Apache Syncope is a powerful open source Identity Management project, that has recently celebrated 5 years as an Apache top level project. Up to recently, a username and password must be supplied to log onto either the admin or enduser web consoles of Apache Syncope. However SAML SSO login is now supported since the 2.0.3 release. Instead of supplying a username/password, the user is redirected to a third party IdP for login, before redirecting back to the Apache Syncope web console. In 2.0.5, support for the IdP-initiated flow of SAML SSO was added.

In this post we will show how to configure Apache Syncope to use SAML SSO as an alternative to logging in using a username and password. We will use Apache CXF Fediz as the SAML SSO IdP. In addition, we will show how to achieve IdP-initiated SSO using Okta. Please also refer to this tutorial on achieving SAML SSO with Syncope and Shibboleth.

1) Logging in to Apache Syncope using SAML SSO

In this section, we will cover setting up Apache Syncope to re-direct to a third party IdP so that the user can enter their credentials. The next section will cover the IdP-initiated case.

1.a) Enable SAML SSO support in Apache Syncope

First we will configure Apache Syncope to enable SAML SSO support. Download and extract the most recent standalone distribution release of Apache Syncope (2.0.6 was used in this post). Start the embedded Apache Tomcat instance and then open a web browser and navigate to "http://localhost:9080/syncope-console", logging in as "admin" and "password".

Apache Syncope is configured with some sample data to show how it can be used. Click on "Users" and add a new user called "alice" by clicking on the subsequent "+" button. Specify a password for "alice" and then select the default values wherever possible (you will need to specify some required attributes, such as "surname"). Now in the left-hand column, click on "Extensions" and then "SAML 2.0 SP". Click on the "Service Provider" tab and then "Metadata". Save the resulting Metadata document, as it will be required to set up the SAML SSO IdP.

1.b) Set up the Apache CXF Fediz SAML SSO IdP

Next we will turn our attention to setting up the Apache CXF Fediz SAML SSO IdP. Download the most recent source release of Apache CXF Fediz (1.4.3 was used for this tutorial). Unzip the release and build it using maven ("mvn clean install -DskipTests"). In the meantime, download and extract the latest Apache Tomcat 8.5.x distribution (tested with 8.5.24). Once Fediz has finished building, copy all of the "IdP" wars (e.g. in fediz-1.4.3/apache-fediz/target/apache-fediz-1.4.3/apache-fediz-1.4.3/idp/war/fediz-*) to the Tomcat "webapps" directory.

There are a few configuration changes to be made to Apache Tomcat before starting it. Download the HSQLDB jar and copy it to the Tomcat "lib" directory. Next edit 'conf/server.xml' and configure TLS on port 8443:

The two keys referenced here can be obtained from 'apache-fediz/target/apache-fediz-1.4.3/apache-fediz-1.4.3/examples/samplekeys/' and should be copied to the root directory of Apache Tomcat. Tomcat can now be started.

Next we have to configure Apache CXF Fediz to support Apache Syncope as a "service" via SAML SSO. Edit 'webapps/fediz-idp/WEB-INF/classes/entities-realma.xml' and add the following configuration:

In addition, we need to make some changes to the "idp-realmA" bean in this file:
  • Add a reference to this bean in the "applications" list: <ref bean="srv-syncope" />
  • Change the "idpUrl" property to: https://localhost:8443/fediz-idp/saml
  • Change the port for "stsUrl" from "9443" to "8443".
Now we need to configure Fediz to accept Syncope's signing cert. Edit the Metadata file you saved from Syncope in step 1.a. Copy the Base-64 encoded certificate in the "KeyDescriptor" section, and paste it (including line breaks) into 'webapps/fediz-idp/WEB-INF/classes/syncope.cert', enclosing it in between "-----BEGIN CERTIFICATE-----" and "-----END CERTIFICATE-----".

Now restart Apache Tomcat. Open a browser and save the Fediz metadata which is available at "http://localhost:8080/fediz-idp/metadata?protocol=saml", which we will require when configuring Apache Syncope.

1.c) Configure the Apache CXF Fediz IdP in Syncope

The final configuration step takes place in Apache Syncope again. In the "SAML 2.0 SP" configuration screen, click on the "Identity Providers" tab and click the "+" button and select the Fediz metadata that you saved in the previous step. Now logout and an additional login option can be seen:


Select the URL for the SAML SSO IdP and you will be redirected to Fediz. Select the IdP in realm "A" as the home realm and enter credentials of "alice/ecila" when prompted. You will be successfully authenticated to Fediz and redirected back to the Syncope admin console, where you will be logged in as the user "alice". 

2) Using IdP-initiated SAML SSO

Instead of the user starting with the Syncope web console, being redirected to the IdP for authentication, and then redirected back to Syncope - it is possible instead to start from the IdP. In this section we will show how to configure Apache Syncope to support IdP-initiated SAML SSO using Okta.

2.a) Configuring a SAML application in Okta

The first step is to create an account at Okta and configure a SAML application. This process is mapped out at the following link. Follow the steps listed on this page with the following additional changes:
  • Specify the following for the Single Sign On URL: http://localhost:9080/syncope-console/saml2sp/assertion-consumer
  • Specify the following for the audience URL: http://localhost:9080/syncope-console/
  • Specify the following for the default RelayState: idpInitiated
When the application is configured, you will see an option to "View Setup Instructions". Open this link in a new tab and find the section about the IdP Metadata. Save this to a local file and set it aside for the moment. Next you need to assign the application to the username that you have created at Okta.

2.b) Configure Apache Syncope to support IdP-Initiated SAML SSO

Log on to the Apache Syncope admin console using the admin credentials, and add a new IdP Provider in the SAML 2.0 SP extension as before, using the Okta metadata file that you have saved in the previous section. Edit the metadata and select the 'Support Unsolicited Logins' checkbox. Save the metadata and make sure that the Okta user is also a valid user in Apache Syncope.

Now go back to the Okta console and click on the application you have configured for Apache Syncope. You should seemlessly be logged into the Apache Syncope admin console.




Friday, December 1, 2017

Kerberos cross-realm support in Apache Kerby 1.1.0

A recent blog post covered how to install the Apache Kerby KDC. In this post we will build on that tutorial to show how to get a major new feature of Apache Kerby 1.1.0 to work - namely kerberos cross-realm support. Cross-realm support means that the KDCs in realm "A" and realm "B" are configured in such a way that a user who is authenticated in realm "A" can obtain a service ticket for a service in realm "B" without having to explicitly authenticate to the KDC in realm "B".

1) Configure the KDC for the "EXAMPLE.COM" realm

First we will configure the Apache Kerby KDC for the "EXAMPLE.COM" realm. Follow the previous tutorial to install and configure the KDC for this (default) realm. We need to follow some additional steps to get cross-realm support working with a second KDC in realm "EXAMPLE2.COM". Edit 'conf/krb5.conf' and replace the "realms" section with the following configuration:
Next we need to add a special principal to the KDC to enable cross-realm support via (after restarting the KDC):
  • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
  • addprinc -pw security krbtgt/EXAMPLE2.COM@EXAMPLE.COM
2) Configure the KDC for the "EXAMPLE2.COM" realm

Now we will configure a second KDC for the "EXAMPLE2.COM" realm. Download the Apache Kerby source code as before. Unzip the source and build the distribution via:
  • mvn clean install -DskipTests
  • cd kerby-dist
  • mvn package -Pdist
Copy "kdc-dist" to a location where you wish to install the second KDC. In this directory, create a directory called "keytabs" and "runtime". Edit 'conf/backend.conf' and change the value for 'backend.json.dir' to avoid conflict with the first KDC instance. Then create some keytabs via:
  • sh bin/kdcinit.sh conf keytabs
For testing purposes, we will change the port of the KDC from the default "88" to "54321" to avoid having to run the KDC with administrator privileges. Edit "conf/krb5.conf" and "conf/kdc.conf" and change "88" to "54321". In addition, change the realm from "EXAMPLE.COM" to "EXAMPLE2.COM" in both of these files. As above, edit 'conf/krb5.conf' and replace the "realms" section with the following configuration:
Next start the KDC via:
  • sh bin/start-kdc.sh conf runtime
We need to add a special principal to the KDC to enable cross-realm support, as in the KDC for the "EXAMPLE.COM" realm. Note that it must be the same principal name and password as for the first realm. We will also add a principal for a service in this realm:
  • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
  • addprinc -pw security krbtgt/EXAMPLE2.COM@EXAMPLE.COM
  • addprinc -pw password service@EXAMPLE2.COM
3) Obtaining a service ticket for service@EXAMPLE2.COM as alice@EXAMPLE.COM

Now we can obtain a service ticket for the service we have configured in the "EXAMPLE2.COM" realm as a user who is authenticated to the "EXAMPLE.COM" realm. Configure the "tool-dist" distribution as per the previous tutorial, updating 'conf/krb5.conf' with the same "realms", "domain_realm" and "capaths" information as shown above. Now we can authenticate as "alice" and obtain a service ticket as follows:
  • sh bin/kinit.sh -conf conf alice@EXAMPLE.COM
  • sh bin/kinit.sh -conf conf -c /tmp/krb5cc_1000 -S service@EXAMPLE2.COM
If you run "klist" then you should see that a ticket for "service@EXAMPLE2.COM" was obtained successfully.

Wednesday, November 29, 2017

Authorizing access to Apache Yarn using Apache Ranger

Earlier this year, I wrote a series of blog posts on how to secure access to the Apache Hadoop filesystem (HDFS), using tools like Apache Ranger and Apache Atlas. In this post, we will go further and show how to authorize access to Apache Yarn using Apache Ranger. Apache Ranger allows us to create and enforce authorization policies based on who is allowed to submit applications to run on Apache Yarn. Therefore it can be used to enforce authorization decisions for Hive on Yarn or Spark on Yarn jobs.

1) Installing Apache Hadoop

First, follow the steps outlined in the earlier tutorial (section 1) on setting up Apache Hadoop, except that in this tutorial we will work with Apache Hadoop 2.8.2. In addition, we will need to follow some additional steps to configure Yarn (see here for the official documentation). Create a new file called 'etc/hadoop/mapred-site.xml' with the content:
Next edit 'etc/hadoop/yarn-site.xml' and add:
Now we can start Apache Yarn via 'sbin/start-yarn.sh'. We are going to submit jobs as a local user called "alice" to test authorization. First we need to create some directories in HDFS:
  • bin/hdfs dfs -mkdir -p /user/alice/input
  • bin/hdfs dfs -put etc/hadoop/*.xml /user/alice/input
  • bin/hadoop fs -chown -R alice /user/alice
  • bin/hadoop fs -mkdir /tmp
  • bin/hadoop fs -chmod og+w /tmp
Now we can submit an example job as "alice" via:
  • sudo -u alice bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar grep input output 'dfs[a-z.]+'
The job should run successfully and store the output in '/user/alice/output'. Delete this directory before trying to run the job again ('bin/hadoop fs -rm -r /user/alice/output').

2) Install the Apache Ranger Yarn plugin

Next we will install the Apache Ranger Yarn plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-1.0.0-SNAPSHOT-yarn-plugin.tar.gz
  • mv ranger-1.0.0-SNAPSHOT-yarn-plugin ${ranger.yarn.home}
Now go to ${ranger.yarn.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "YarnTest".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hadoop installation
Save "install.properties" and install the plugin as root via "sudo -E ./enable-yarn-plugin.sh". Make sure that the user who is running Yarn has the permission to read the policies stored in '/etc/ranger/YarnTest'. There is one additional step to be performed in Hadoop before restarting Yarn. Edit 'etc/hadoop/ranger-yarn-security.xml' and add a property called "ranger.add-yarn-authorization" with value "false". This means that if Ranger policy authorization fails, it doesn't fall back to the default Yarn ACLs (which allow all users to submit jobs to the default queue).

Finally, re-start Yarn and try to resubmit the job as "alice" as per the previous section. You should now see an authorization error: "User alice cannot submit applications to queue root.default".

3) Create authorization policies in the Apache Ranger Admin console

Next we will use the Apache Ranger admin console to create authorization policies for Yarn. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Apache Ranger admin service with "sudo ranger-admin start" and open a browser and go to "http://localhost:6080/" and log on with "admin/admin". Add a new Yarn service with the following configuration values:
  • Service Name: YarnTest
  • Username: admin
  • Password: admin
  • Yarn REST URL: http://localhost:8088
Click on "Test Connection" to verify that we can connect successfully to Yarn + then save the new service. Now click on the "YarnTest" service that we have created. Add a new policy for the "root.default" queue for the user "alice" (create this user if you have not done so already under "Settings, Users/Groups"), with a permission of "submit-app".

Allow up to 30 seconds for the Apache Ranger plugin to download the new authorization policy from the admin service. Then try to re-run the job as "alice". This time it should succeed due to the authorization policy that we have created.

Tuesday, November 28, 2017

Installing the Apache Kerby KDC

Apache Kerby is a subproject of the Apache Directory project, and is a complete open-source KDC written entirely in Java. Apache Kerby 1.1.0 has just been released. This release contains two major new features: a GSSAPI module (covered previously here) and cross-realm support (the subject of a forthcoming blog post).

I have previously used Apache Kerby in this blog as a KDC to illustrate some security-based test-cases for big data components such as Apache Hadoop, Hive, Storm, etc, by pointing to some code on github that shows how to launch a Kerby KDC using Apache maven. This is convenient as a KDC can be launched with the principals already created via a single maven command. However, it is not suitable if the KDC is to be used in a standalone setting.

In this post, we will show how to create a Kerby KDC distribution without writing any code.

1) Install and configure the Apache Kerby KDC

The first step is to download the Apache Kerby source code. Unzip the source and build the distribution via:
  • mvn clean install -DskipTests
  • cd kerby-dist
  • mvn package -Pdist
The "kerby-dist" directory contains the KDC distribution in "kdc-dist", as well as the client tools in "tool-dist". Copy both "kdc-dist" and "tool-dist" directories to another location instead of working directly in the Kerby source. In "kdc-dist" create a directory called "keytabs" and "runtime". Then create some keytabs via:
  • sh bin/kdcinit.sh conf keytabs
This will create keytabs for the "kadmin" and "protocol" principals, and store them in the "keytabs" directory. For testing purposes, we will change the port of the KDC from the default "88" to "12345" to avoid having to run the KDC with administrator privileges. Edit "conf/krb5.conf" and "conf/kdc.conf" and change "88" to "12345".

The Kerby principals are stored in a backend that is configured in "conf/backend.conf". By default this is a JSON file that is stored in "/tmp/kerby/jsonbackend". However, Kerby also supports other more robust backends, such as LDAP, Mavibot, Zookeeper, etc.

We can start the KDC via:
  • sh bin/start-kdc.sh conf runtime
Let's create a new user called "alice":
  • sh bin/kadmin.sh conf/ -k keytabs/admin.keytab
  • addprinc -pw password alice@EXAMPLE.COM
2) Install and configure the Apache Kerby tool dist

We can check that the KDC has started properly using the MIT kinit tool, if it is installed locally:
  • export KRB5_CONFIG=/path.to.kdc.dist/conf/krb5.conf
  • kinit alice (use "password" for the password when prompted)
Now you can see the ticket for alice using "klist". Apache Kerby also ships a "tool-dist" distribution that contains implementations of "kinit", "klist", etc. First call "kdestroy" to remove the ticket previously obtained for "alice". Then go into the directory where "tool-dist" was installed to in the previous section. Edit "conf/krb5.conf" and replace "88" with "12345". We can now obtain a ticket for "alice" via:
  • sh bin/kinit.sh -conf conf alice
  • sh bin/klist.sh


Thursday, September 21, 2017

Configuring Kerberos for Hive in Talend Open Studio for Big Data

Earlier this year, I showed how to use Talend Open Studio for Big Data to access data stored in HDFS, where HDFS had been configured to authenticate users using Kerberos. A similar blog post showed how to read data from an Apache Kafka topic using kerberos. In this tutorial I will show how to create a job in Talend Open Studio for Big Data to read data from an Apache Hive table using kerberos. As a prerequisite, please follow a recent tutorial on setting up Apache Hadoop and Apache Hive using kerberos. 

1) Download Talend Open Studio for Big Data and create a job

Download Talend Open Studio for Big Data (6.4.1 was used for the purposes of this tutorial). Unzip the file when it is downloaded and then start the Studio using one of the platform-specific scripts. It will prompt you to download some additional dependencies and to accept the licenses. Click on "Create a new job" called "HiveKerberosRead". In the search bar under "Palette" on the right hand side enter "hive" and hit enter. Drag "tHiveConnection" and "tHiveInput" to the middle of the screen. Do the same for "tLogRow":

"tHiveConnection" will be used to configure the connection to Hive. "tHiveInput" will be used to perform a query on the "words" table we have created in Hive (as per the earlier tutorial linked above), and finally "tLogRow" will just log the data so that we can be sure that it was read correctly. The next step is to join the components up. Right click on "tHiveConnection" and select "Trigger/On Subjob Ok" and drag the resulting line to "tHiveInput". Right click on "tHiveInput" and select "Row/Main" and drag the resulting line to "tLogRow":



3) Configure the components

Now let's configure the individual components. Double click on "tHiveConnection". Select the following configuration options:
  • Distribution: Hortonworks
  • Version: HDP V2.5.0
  • Host: localhost
  • Database: default
  • Select "Use Kerberos Authentication"
  • Hive Principal: hiveserver2/localhost@hadoop.apache.org
  • Namenode Principal: hdfs/localhost@hadoop.apache.org
  • Resource Manager Principal: mapred/localhost@hadoop.apache.org
  • Select "Use a keytab to authenticate"
  • Principal: alice
  • Keytab: Path to "alice.keytab" in the Kerby test project.
  • Unselect "Set Resource Manager"
  • Set Namenode URI: "hdfs://localhost:9000"

Now click on "tHiveInput" and select the following configuration options:
  • Select "Use an existing Connection"
  • Choose the tHiveConnection name from the resulting "Component List".
  • Click on "Edit schema". Create a new column called "word" of type String, and a column called "count" of type int. 
  • Table name: words
  • Query: "select * from words where word == 'Dare'"

Now the only thing that remains is to point to the krb5.conf file that is generated by the Kerby project. Click on "Window/Preferences" at the top of the screen. Select "Talend" and "Run/Debug". Add a new JVM argument: "-Djava.security.krb5.conf=/path.to.kerby.project/target/krb5.conf":
Now we are ready to run the job. Click on the "Run" tab and then hit the "Run" button. You should see the following output in the Run Window in the Studio:

Wednesday, September 20, 2017

Securing Apache Hive - part VI

This the sixth and final blog post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. The fifth post looked at an alternative authorization solution called Apache Sentry.

In this post we will switch our attention from authorization to authentication, and show how we can authenticate Apache Hive users via kerberos.

1) Set up a KDC using Apache Kerby

A github project that uses Apache Kerby to start up a KDC is available here:
  • bigdata-kerberos-deployment: This project contains some tests which can be used to test kerberos with various big data deployments, such as Apache Hadoop etc.
The KDC is a simple junit test that is available here. To run it just comment out the "org.junit.Ignore" annotation on the test method. It uses Apache Kerby to define the following principals for both Apache Hadoop and Apache Hive:
  • hdfs/localhost@hadoop.apache.org
  • HTTP/localhost@hadoop.apache.org
  • mapred/localhost@hadoop.apache.org
  • hiveserver2/localhost@hadoop.apache.org
  • alice@hadoop.apache.org 
Keytabs are created in the "target" folder. Kerby is configured to use a random port to lauch the KDC each time, and it will create a "krb5.conf" file containing the random port number in the target directory.

2) Configure Apache Hadoop to use Kerberos

The next step is to configure Apache Hadoop to use Kerberos. As a pre-requisite, follow the first tutorial on Apache Hive so that the Hadoop data and Hive table are set up before we apply Kerberos to the mix. Next, follow the steps in section (2) of an earlier tutorial on configuring Hadoop with Kerberos that I wrote. Some additional steps are also required when configuring Hadoop for use with Hive.

Edit 'etc/hadoop/core-site.xml' and add:
  • hadoop.proxyuser.hiveserver2.groups: *
  • hadoop.proxyuser.hiveserver2.hosts: localhost
The previous tutorial on securing HDFS with kerberos did not specify any kerberos configuration for Map-Reduce, as it was not required. For Apache Hive we need to configure Map Reduce appropriately. We will simplify things by using a single principal for the Job Tracker, Task Tracker and Job History. Create a new file 'etc/hadoop/mapred-site.xml' with the following properties:
  • mapreduce.framework.name: classic
  • mapreduce.jobtracker.kerberos.principal: mapred/localhost@hadoop.apache.org
  • mapreduce.jobtracker.keytab.file: Path to Kerby mapred.keytab (see above).
  • mapreduce.tasktracker.keytab.file: mapred/localhost@hadoop.apache.org
  • mapreduce.tasktracker.keytab.file: Path to Kerby mapred.keytab (see above).
  • mapreduce.jobhistory.kerberos.principal:  mapred/localhost@hadoop.apache.org
  • mapreduce.jobhistory.keytab.file: Path to Kerby mapred.keytab (see above).
Start Kerby by running the JUnit test as described in the first section. Now start HDFS via:
  • sbin/start-dfs.sh
  • sudo sbin/start-secure-dns.sh
3) Configure Apache Hive to use Kerberos

Next we will configure Apache Hive to use Kerberos. Edit 'conf/hiveserver2-site.xml' and add the following properties:
  • hive.server2.authentication: kerberos
  • hive.server2.authentication.kerberos.principal: hiveserver2/localhost@hadoop.apache.org
  • hive.server2.authentication.kerberos.keytab: Path to Kerby hiveserver2.keytab (see above).
Start Hive via 'bin/hiveserver2'. In a separate window, log on to beeline via the following steps:
  • export KRB5_CONFIG=/pathtokerby/target/krb5.conf
  • kinit -k -t /pathtokerby/target/alice.keytab alice
  • bin/beeline -u "jdbc:hive2://localhost:10000/default;principal=hiveserver2/localhost@hadoop.apache.org"
At this point authentication is successful and we should be able to query the "words" table as per the first tutorial.

Friday, September 15, 2017

Securing Apache Hive - part V

This is the fifth in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query. The fourth post looked how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. In this post we will look at an alternative authorization solution called Apache Sentry.

1) Build the Apache Sentry distribution

First we will build and install the Apache Sentry distribution. Download Apache Sentry (1.8.0 was used for the purposes of this tutorial). Verify that the signature is valid and that the message digests match. Now extract and build the source and copy the distribution to a location where you wish to install it:
  • tar zxvf apache-sentry-1.8.0-src.tar.gz
  • cd apache-sentry-1.8.0-src
  • mvn clean install -DskipTests
  • cp -r sentry-dist/target/apache-sentry-1.8.0-bin ${sentry.home}
I previously covered the authorization plugin that Apache Sentry provides for Apache Kafka. In addition, Apache Sentry provides an authorization plugin for Apache Hive. For the purposes of this tutorial we will just configure the authorization privileges in a configuration file locally to the Hive Server. Therefore we don't need to do any further configuration to the distribution at this point.

2) Install and configure Apache Hive

Please follow the first tutorial to install and configure Apache Hadoop if you have not already done so. Apache Sentry 1.8.0 does not support Apache Hive 2.1.x, so we will need to download and extract Apache Hive 2.0.1. Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Then follow the steps as outlined in the first tutorial to create the table in Hive and make sure that a query is successful.

3) Integrate Apache Sentry with Apache Hive

Now we will integrate Apache Sentry with Apache Hive. We need to add three new configuration files to the "conf" directory of Apache Hive.

3.a) Configure Apache Hive to use authorization

Create a file called 'conf/hiveserver2-site.xml' with the content:
Here we are enabling authorization and adding the Sentry authorization plugin.

3.b) Add Sentry plugin configuration

Create a new file in the "conf" directory of Apache Hive called "sentry-site.xml" with the following content:
This is the configuration file for the Sentry plugin for Hive. It essentially says that the authorization privileges are stored in a local file, and that the groups for authenticated users should be retrieved from this file. As we are not using Kerberos, the "testing.mode" configuration parameter must be set to "true".

3.c) Add the authorization privileges for our test-case

Next, we need to specify the authorization privileges. Create a new file in the config directory called "sentry.ini" with the following content:
Here we are granting the user "alice" a role which allows her to perform a "select" on the table "words".

3.d) Add Sentry libraries to Hive

Finally, we need to add the Sentry libraries to Hive. Copy the following files from ${sentry.home}/lib  to ${hive.home}/lib:
  • sentry-binding-hive-common-1.8.0.jar
  • sentry-core-model-db-1.8.0.jar
  • sentry*provider*.jar
  • sentry-core-common-1.8.0.jar
  • shiro-core-1.2.3.jar
  • sentry-policy*.jar
  • sentry-service-*.jar
In addition we need the "sentry-binding-hive-v2-1.8.0.jar" which is not bundled with the Apache Sentry distribution. This can be obtained from "http://repo1.maven.org/maven2/org/apache/sentry/sentry-binding-hive-v2/1.8.0/sentry-binding-hive-v2-1.8.0.jar" instead.

4) Test authorization with Apache Hive

Now we can test authorization after restarting Apache Hive. The user 'alice' can query the table according to our policy:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words where word == 'Dare'; (works)
However, the user 'bob' is denied access:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
  • select * from words where word == 'Dare'; (fails)

Thursday, September 14, 2017

Securing Apache Hive - part IV

This is the fourth in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. The third post looked at how to use Apache Ranger to create policies to both mask and filter data returned in the Hive query.

In this post we will show how Apache Ranger can create "tag" based authorization policies for Apache Hive using Apache Atlas. In the second post, we showed how to create a "resource" based policy for "alice" in Ranger, by granting "alice" the "select" permission for the "words" table. Instead, we can grant a user "bob" the "select" permission for a given "tag", which is synced into Ranger from Apache Atlas. This means that we can avoid managing specific resources in Ranger itself.

1) Start Apache Atlas and create entities/tags for Hive

First let's look at setting up Apache Atlas. Download the latest released version (0.8.1) and extract it. Build the distribution that contains an embedded HBase and Solr instance via:
  • mvn clean package -Pdist,embedded-hbase-solr -DskipTests
The distribution will then be available in 'distro/target/apache-atlas-0.8.1-bin'. To launch Atlas, we need to set some variables to tell it to use the local HBase and Solr instances:
  • export MANAGE_LOCAL_HBASE=true
  • export MANAGE_LOCAL_SOLR=true
Now let's start Apache Atlas with 'bin/atlas_start.py'. Open a browser and go to 'http://localhost:21000/', logging on with credentials 'admin/admin'. Click on "TAGS" and create a new tag called "words_tag".  Unlike for HDFS or Kafka, Atlas doesn't provide an easy way to create a Hive Entity in the UI. Instead we can use the following json file to create a Hive Entity for the "words" table that we are using in our example, that is based off the example given here:
You can upload it to Atlas via:
  • curl -v -H 'Accept: application/json, text/plain, */*' -H 'Content-Type: application/json;  charset=UTF-8' -u admin:admin -d @hive-create.json http://localhost:21000/api/atlas/entities
Once the new entity has been uploaded, then you can search for it in the Atlas UI. Once it is found, then click on "+" beside "Tags" and associate the new entity with the "words_tag" tag.

2) Use the Apache Ranger TagSync service to import tags from Atlas into Ranger

To create tag based policies in Apache Ranger, we have to import the entity + tag we have created in Apache Atlas into Ranger via the Ranger TagSync service. After building Apache Ranger then extract the file called "target/ranger-<version>-tagsync.tar.gz". Edit 'install.properties' as follows:
  • Set TAG_SOURCE_ATLAS_ENABLED to "false"
  • Set TAG_SOURCE_ATLASREST_ENABLED to  "true" 
  • Set TAG_SOURCE_ATLASREST_DOWNLOAD_INTERVAL_IN_MILLIS to "60000" (just for testing purposes)
  • Specify "admin" for both TAG_SOURCE_ATLASREST_USERNAME and TAG_SOURCE_ATLASREST_PASSWORD
Save 'install.properties' and install the tagsync service via "sudo ./setup.sh". Start the Apache Ranger admin service via "sudo ranger-admin start" and then the tagsync service via "sudo ranger-tagsync-services.sh start".

3) Create Tag-based authorization policies in Apache Ranger

Now let's create a tag-based authorization policy in the Apache Ranger admin UI (http://localhost:6080). Click on "Access Manager" and then "Tag based policies". Create a new Tag service called "HiveTagService". Create a new policy for this service called "WordsTagPolicy". In the "TAG" field enter a "w" and the "words_tag" tag should pop up, meaning that it was successfully synced in from Apache Atlas. Create an "Allow" condition for the user "bob" with the "select" permissions for "Hive":
We also need to go back to the Resource based policies and edit "cl1_hive" that we created in the second tutorial, and select the tag service we have created above. Once our new policy (including tags) has synced to '/etc/ranger/cl1_hive/policycache' we can test authorization in Hive. Previously, the user "bob" was denied access to the "words" table, as only "alice" was assigned a resource-based policy for the table. However, "bob" can now access the table via the tag-based authorization policy we have created:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
  • select * from words where word == 'Dare';

Monday, September 11, 2017

Integrating JSON Web Tokens with Kerberos using Apache Kerby

JSON Web Tokens (JWTs) are a standard way of encapsulating a number of claims about a particular subject. Kerberos is a long-established and widely-deployed SSO protocol, used extensively in the Big-Data space in recent years. An interesting question is to examine how a JWT could be used as part of the Kerberos protocol. In this post we will consider one possible use-case, where a JWT is used to convey additional authorization information to the kerberized service provider.

This use-case is based on a document available at HADOOP-10959, called "A Complement and Short Term Solution to TokenAuth Based on
Kerberos Pre-Authentication Framework", written by Kai Zheng and Weihua Jiang of Intel (also see here).

1) The test-case

To show how to integrate JWTs with Kerberos we will use a concrete test-case available in my github repo here:
  • cxf-kerberos-kerby: This project contains a number of tests that show how to use Kerberos with Apache CXF, where the KDC used in the tests is based on Apache Kerby
The test-case relevant to this blog entry is the JWTJAXRSAuthenticationTest. Here we have a trivial "double it" JAX-RS service implemented using Apache CXF, which is secured using Kerberos. An Apache Kerby-based KDC is launched which the client code uses to obtain a service ticket using JAAS (all done transparently by CXF), which is sent to the service code as part of the Authorization header when making the invocation.

So far this is just a fairly typical example of a kerberized web-service request. What is different is that the service configuration requires a level of authorization above and beyond the kerberos ticket, by insisting that the user must have a particular role to access the web service. This is done by inserting the CXF SimpleAuthorizingInterceptor into the service interceptor chain. An authenticated user must have the "boss" role to access this service. 

So we need somehow to convey the role of the user as part of the kerberized request. We can do this using a JWT as will be explained in the next few sections.

2) High-level overview of JWT use-case with Kerberos
 
As stated above, we need to convey some additional claims about the user to the service. This can be done by including a JWT containing those claims in the Kerberos service ticket. Let's assume that the user is in possession of a JWT that is issued by an IdP that contains a number of claims relating to that user (including the "role" as required by the service in our test-case). The token must be sent to the KDC when obtaining a service ticket.

The KDC must validate the token (checking the signature is correct, and that the signing identity is trusted, etc.). The KDC must then extract some relevant information from the token and insert it somehow into the service ticket. The kerberos spec defines a structure that can be used for this purposes called the AuthorizationData, which consists of a "type" along with some data to be interpreted according to the "type". We can use this structure to insert the encoded JWT as part of the data.  

On the receiving side, the service can extract the AuthorizationData structure from the received ticket and parse it accordingly to retrieve the JWT, and obtain whatever claims are desired from this token accordingly.

3) Sending a JWT Token to the KDC

Let's take a look at how the test-case works in more detail, starting with the client. The test code retrieves a JWT for "alice" by invoking on the JAX-RS interface of the Apache CXF STS. The token contains the claim that "alice" has the "boss" role, which is required to invoke on the "double it" service. Now we need to send this token to the KDC to retrieve a service ticket for the "double it" service, with the JWT encoded in the ticket.

This cannot be done by the built-in Java GSS implementation. Instead we will use Apache Kerby. Apache Kerby has been covered extensively on this blog (see for example here). As well as providing the implementation for the KDC used in our test-case, Apache Kerby provides a complete GSS implementation that supports tokens in the forthcoming 1.1.0 release. To use the Kerby GSS implementation we need to register the KerbyGssProvider as a Java security provider.

To actually pass the JWT we got from the STS to the Kerby GSS layer, we need to use a custom version of the CXF HttpAuthSupplier interface. The KerbyHttpAuthSupplier implementation takes the JWT String, and creates a Kerby KrbToken class using it. This class is added to the private credential list of the current JAAS Subject. This way it will be available to the Kerby GSS layer, which will send the token to the KDC using Kerberos pre-authentication as defined in the document which is linked at the start of this post.

4) Processing the received token in the KDC

The Apache Kerby-based KDC extracts the JWT token from the pre-authentication data entry and verifies that it is signed and that the issuer is trusted. The KDC is configured in the test-case with a certificate to use for this purpose, and also with an issuer String against which the issuer of the JWT must match. If there is an audience claim in the token, then it must match the principal of the service for which we are requesting a ticket. 

If the verification of the received JWT passes, then it is inserted into the AuthorizationData structure in the issued service ticket. The type that is used is a custom value defined here, as this behaviour is not yet standardized. The JWT is serialized and added to the data part of the token. Note that this behaviour is fully customizable.

5) Processing the AuthorizationData structure on the service end

After the service successfully authenticates the client, we have to access the AuthorizationData part of the ticket to extract the JWT. This can all be done using the Java APIs, Kerby is not required on the receiving side. The standard CXF interceptor for Kerberos is subclassed in the tests, to set up a custom CXF SecurityContext using the GssContext. By casting it to a ExtendedGSSContext, we can access the AuthorizationData and hence the JWT. The role claim is then extracted from the JWT and used to enforce the standard "isUserInRole" method of the CXF SecurityContext. 

If you are interested in exploring this topic further, please get involved with the Apache Kerby project, and help us to further improve and expand this integration between JWT and Kerberos.

Thursday, September 7, 2017

Securing Apache Hive - part III

This is the third in a series of blog posts on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. The second post looked at how to add authorization to the previous tutorial using Apache Ranger. In this post we will extend the authorization scenario by showing how Apache Ranger can be used to create policies to both mask and filter data returned in the Hive query.

1) Data-masking with Apache Ranger

As a pre-requisite to this tutorial, please follow the previous post to set up Apache Hive and to enforce an authorization policy for the user "alice" using Apache Ranger. Now let's imagine that we would like "alice" to be able to see the "counts", but not the actual words themselves. We can create a data-masking policy in Apache Ranger for this. Open a browser and log in at "http://localhost:6080" using "admin/admin" and click on the "cl1_hive" service that we have created in the previous tutorial.

Click on the "Masking" tab and add a new policy called "WordMaskingPolicy", for the "default" database, "words" table and "word" column. Under the mask conditions, add the user "alice" and choose the "Redact" masking option. Save the policy and wait for it to by synced over to Apache Hive:


Now try to login to beeline as "alice" and view the first five entries in the table:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words LIMIT 5;
You should see that the characters in the "word" column have been masked (replaced by "x"s).



2) Row-level filtering with Apache Ranger 

Now let's imagine that we are happy for "alice" to view the "words" in the table, but that we would like to restrict her to words that start with a "D". The previous "access" policy we created for her allows her to view all "words" in the table. We can do this by specifying a row-level filter policy. Click on the "Masking" tab in the UI and disable the policy we created in the previous section.

Now click on the "Row-level Filter" tab and create a new policy called "AliceFilterPolicy" on the "default" database, "words" table. Add a Row Filter condition for the user "alice" with row filter "word LIKE 'D%'". Save the policy and wait for it to by synced over to Apache Hive:


Now try to login to beeline as "alice" as above. "alice" can successfully retrieve all entries where the words start with "D", but no other entries via:
  • select * from words where word like 'D%';

Tuesday, August 1, 2017

Securing Apache Hive - part II

This is the second post in a series of articles on securing Apache Hive. The first post looked at installing Apache Hive and doing some queries on data stored in HDFS. In this post we will show how to add authorization to the previous example using Apache Ranger.

1) Install the Apache Ranger Hive plugin

If you have not done so already, please follow the first post to install and configure Apache Hadoop and Apache Hive. Next download Apache Ranger and verify that the signature is valid and that the message digests match. Due to some bugs that were fixed for the installation process, I am using version 1.0.0-SNAPSHOT in this post. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-1.0.0-SNAPSHOT-hive-plugin.tar.gz
  • mv ranger-1.0.0-SNAPSHOT-hive-plugin ${ranger.hive.home}
Now go to ${ranger.hive.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "cl1_hive".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Hive installation
Save "install.properties" and install the plugin as root via "sudo -E ./enable-hive-plugin.sh". The Apache Ranger Hive plugin should now be successfully installed. Make sure that the default policy cache for the Hive plugin '/etc/ranger/cl1_hive/policycache' is readable by the user who is running the Hive server. Then restart the Apache Hive server to enable the authorization plugin.

2) Create authorization policies in the Apache Ranger Admin console

Next we will use the Apache Ranger admin console to create authorization policies for Apache Hive. Follow the steps in this tutorial to install the Apache Ranger admin service. Start the Ranger admin service via 'sudo ranger-admin start' and open a browser at 'http://localhost:6080', logging on with the credentials 'admin/admin'. Click the "+" button next to the "HIVE" logo and enter the following properties:
  • Service Name: cl1_hive
  • Username/Password: admin
  • jdbc.url: jdbc:hive2://localhost:10000
Note that "Test Connection" won't work as the "admin" user will not have the necessary authorization to invoke on Hive at this point. Click "Add" to create the service. If you have not done so in a previous tutorial, click on "Settings" and then "Users/Groups" and add two new users called "alice" and "bob", who we will use to test authorization. Then go back to the newly created "cl1_hive" service, and click "Add new policy" with the following properties:
  • Policy Name: SelectWords
  • database: default
  • table: words
  • Hive column: *
Then under "Allow Conditions", give "alice" the "select" permission and click "Add".


3) Test authorization with Apache Hive

Once our new policy has synced to '/etc/ranger/cl1_hive/policycache' we can test authorization in Hive. The user 'alice' can query the table according to our policy:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n alice
  • select * from words where word == 'Dare'; (works)
However, the user 'bob' is denied access:
  • bin/beeline -u jdbc:hive2://localhost:10000 -n bob
  • select * from words where word == 'Dare'; (fails)

Friday, July 28, 2017

Third party SSO support for Apache Syncope REST services

A recent blog post covered SSO support for Apache Syncope REST services. This was a new feature added in the 2.0.3 release, which allows a user to obtain a JWT from the Syncope "accessTokens/login" REST endpoint. This token can then be used to repeatedly invoke on a Syncope REST service. However, what if you wish to allow a user invoke on a Syncope REST service using a (JWT) token issued by a third party IdP instead? From Syncope 2.0.5 this will be possible.

In this post we will cover how to use a JWT issued by a third-party to invoke on an Apache Syncope REST service. The code is available on github here:
  • cxf-syncope2-webapp: A pre-configured web application of the Syncope core for use in the tests.
  • cxf-syncope2: Some integration tests that use cxf-syncope2-webapp for authentication and authorization purposes. JWTTestIT illustrates third party SSO integration with Syncope as covered in this post.
1) Configuring Apache Syncope to accept third-party JWTs

Naturally, if we invoke on an Apache Syncope REST service using an arbitrary third-party token, access will be denied as Syncope will not be able to validate the signature on the token correctly. By default, Syncope uses the following properties defined in 'security.properties' to both issue and validate signed tokens:
  • jwtIssuer: The issuer of the token
  • jwsKey: The Hex-encoded (symmetric) verification key
The default signature algorithm is the symmetric algorithm HS512. To allow third-party tokens we need to implement the JWTSSOProvider interface provided in Syncope. By default, Syncope searches for JWTSSOProvider implementations on the classpath under the package name "org.apache.syncope.core", so no explicit configuration changes are required to plug in a custom JWTSSOProvider implementation.

When Syncope receives a signed JWT it will query which of the configured JWTSSOProvider implementations can verify the token, by matching the 'getIssuer()' method to the issuer of the token. The 'getAlgorithm()' method should match the signature algorithm of the received token. The 'verify' method should validate the signature of the received token. The implementation used in the tests is available here. A keystore is read in and the certificate contained in it is used to verify the signature on the received token. 

One final interesting point is that we need to map the authenticated JWT subject to a user in Syncope somehow. This is done in the JWTSSOProvider implementation via the 'resolve' method. In our test implementation, we map the JWT subject directly to a Syncope username.

2) Obtain a JWT from the Apache CXF STS using REST

Now that we have set up Apache Syncope to allow third-party JWTs, we need to obtain such a token to get our test-case to work. We will use the Apache CXF Security Token Service (STS) to obtain a JWT. For simplicity we will leverage the REST interface of the CXF STS, which allows us to obtain a token with a simple REST call. The STS is configured via spring to issue signed JWTs. User authentication to the STS is enforced via basic authentication. In the test code, we use the CXF WebClient to invoke on the STS and to get a JWT back:

Now we can use this token with the Syncope client API to call the user "self service" successfully:


Thursday, July 20, 2017

Securing Apache Hive - part I

This is the first post in a series of articles on securing Apache Hive. In this article we will look at installing Apache Hive and doing some queries on data stored in HDFS. We will not consider any security requirements in this post, but the test deployment will be used by future posts in this series on authenticating and authorizing access to Hive.

1) Install and configure Apache Hadoop

The first step is to install and configure Apache Hadoop. Please follow section 1 of this earlier tutorial for information on how to do this. In addition, we need to configure two extra properties in 'etc/hadoop/core-site.xml':
  • hadoop.proxyuser.$user.groups: *
  • hadoop.proxyuser.$user.hosts: localhost
where "$user" above should be replaced with the user that is going to run the hive server below. As we are not using authentication in this tutorial, this allows the $user to impersonate the "anonymous" user, who will connect to Hive via beeline and run some queries.

Once HDFS has started, we need to create some directories for use by Apache Hive, and change the permissions appropriately:
  • bin/hadoop fs -mkdir -p /user/hive/warehouse /tmp
  • bin/hadoop fs -chmod g+w /user/hive/warehouse /tmp
  • bin/hadoop fs -mkdir /data
The "/data" directory will hold a file which represents the output of a map-reduce job. For the purposes of this tutorial, we will use a sample output of the canonical "Word Count" map-reduce job on some text. The file consists of two columns separated by a tab character, where the left column is the word, and the right column is the total count associated with that word in the original document.

I've uploaded such a sample output here. Download it and upload it to the HDFS data directory:
  • bin/hadoop fs -put output.txt /data
2) Install and configure Apache Hive

Now we will install and configure Apache Hive. Download and extract Apache Hive (2.1.1 was used for the purposes of this tutorial). Set the "HADOOP_HOME" environment variable to point to the Apache Hadoop installation directory above. Now we will configure the metastore and start Hiveserver2:
  • bin/schematool -dbType derby -initSchema
  • bin/hiveserver2
In a separate window, we will start beeline to connect to the hive server, where $user is the user who is running Hadoop (necessary as we are going to create some data in HDFS, and otherwise wouldn't have the correct permissions):
  • bin/beeline -u jdbc:hive2://localhost:10000 -n $user
Once we are connected, then create a Hive table and load the map reduce output data into a new table called "words":
  • create table words (word STRING, count INT) row format delimited fields terminated by '\t' stored as textfile;
  • LOAD DATA INPATH '/data/output.txt' INTO TABLE words;
Now we can run some queries on the data as the anonymous user. Log out of beeline and then back in and run some queries via:
  • bin/beeline -u jdbc:hive2://localhost:10000
  • select * from words where word == 'Dare';

Friday, June 30, 2017

Securing Apache Solr - part III

This is the third post in a series of articles on securing Apache Solr. The first post looked at setting up a sample SolrCloud instance and securing access to it via Basic Authentication. The second post looked at how the Apache Ranger admin service can be configured to store audit information in Apache Solr. In this post we will extend the example in the first article to include authorization, by showing how to create and enforce authorization policies using Apache Ranger.

1) Install the Apache Ranger Solr plugin

The first step is to install the Apache Ranger Solr plugin. Download Apache Ranger and verify that the signature is valid and that the message digests match. Now extract and build the source, and copy the resulting plugin to a location where you will configure and install it:
  • mvn clean package assembly:assembly -DskipTests
  • tar zxvf target/ranger-${version}-solr-plugin.tar.gz
  • mv ranger-${version}-solr-plugin ${ranger.solr.home}
Now go to ${ranger.solr.home} and edit "install.properties". You need to specify the following properties:
  • POLICY_MGR_URL: Set this to "http://localhost:6080"
  • REPOSITORY_NAME: Set this to "solr_service".
  • COMPONENT_INSTALL_DIR_NAME: The location of your Apache Solr server directory
Save "install.properties" and install the plugin as root via "sudo -E ./enable-solr-plugin.sh". Make sure that the user who is running Solr can read the "/etc/ranger/solr_service/policycache". Now follow the first tutorial to get an example SolrCloud instance up and running with a "gettingstarted" collection. We will not enable the authorization plugin just yet.

2) Create authorization policies for Solr using the Apache Ranger Admin service

Now follow the second tutorial to download and install the Apache Ranger admin service. To avoid conflicting with the Solr example we are securing, we will skip the section about auditing to Apache Solr (sections 3 and 4). In addition, in section 5 the "audit_store" property can be left empty, and the Solr audit properties can be omitted. Start the Apache Ranger admin service via: "sudo ranger-admin start", and open a browser at "http://localhost:6080", logging on with "admin/admin" credentials. Click on the "+" button for the Solr service and create a new service with the following properties:
  • Service Name: solr_service
  • Username: alice
  • Password: SolrRocks
  • Solr URL: http://localhost:8983/solr
Hit the "Test Connection" button and it should show that it has successfully connected to Solr. Click "Add" and then click on the "solr_service" link that is subsequently created. We will grant a policy that allows "alice" the ability to read the "gettingstarted" collection. If "alice" is not already created, go to "Settings/User+Groups" and create a new user there. Delete the default policy that is created in the "solr_service" and then click on "Add new policy" and create a new policy called "gettingstarted_policy". For "Solr Collection" enter "g" here and the "gettingstarted" collection should pop up. Add a new "allow condition" granting the user "alice" the "others" and "query" permissions.




3) Test authorization using the Apache Ranger plugin for Solr

Now we are ready to enable the Apache Ranger authorization plugin for Solr. Download the following security configuration which enables Basic Authentication in Solr as well as the Apache Ranger authorization plugin:
Now upload this configuration to the Apache Zookeeper instance that is running with Solr:
  • server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd putfile /security.json security.json
 Now let's try to query the "gettingstarted" collection as 'alice':
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller
This should be successful. However, authorization will fail for the case of "bob":
  • curl -u bob:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller
In addition, although "alice" can query the collection, she can't write to it, and the following query will return 403:
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book4", "title_t" : "Hamlet", "author_s" : "William Shakespeare"}]'

Tuesday, June 27, 2017

Securing Apache Solr - part II

This is the second post in a series of articles on securing Apache Solr. The first post looked at setting up a sample SolrCloud instance and securing access to it via Basic Authentication. In this post we will temporarily deviate from the concept of "securing Apache Solr", and instead look at how the Apache Ranger admin service can be configured to store audit information in Apache Solr.

1) Download and extract the Apache Ranger admin service

The first step is to download the source code, as well as the signature file and associated message digests (all available on the download page). Verify that the signature is valid and that the message digests match. Now extract and build the source, and copy the resulting admin archive to a location where you wish to install the UI:
  • tar zxvf apache-ranger-incubating-1.0.0.tar.gz
  • cd apache-ranger-incubating-1.0.0
  • mvn clean package assembly:assembly 
  • tar zxvf target/ranger-1.0.0-admin.tar.gz
  • mv ranger-1.0.0-admin ${rangerhome}
2) Install MySQL

The Apache Ranger Admin UI requires a database to keep track of users/groups as well as policies for various big data projects that you are securing via Ranger. For the purposes of this tutorial, we will use MySQL. Install MySQL in $SQL_HOME and start MySQL via:
  • sudo $SQL_HOME/bin/mysqld_safe --user=mysql
Now you need to log on as the root user and create two users for Ranger. We need a root user with admin privileges (let's call this user "admin") and a user for the Ranger Schema (we'll call this user "ranger"):
  • CREATE USER 'admin'@'localhost' IDENTIFIED BY 'password';
  • GRANT ALL PRIVILEGES ON * . * TO 'admin'@'localhost' WITH GRANT OPTION;
  • CREATE USER 'ranger'@'localhost' IDENTIFIED BY 'password';
  • FLUSH PRIVILEGES;
Finally,  download the JDBC driver jar for MySQL and put it in ${rangerhome}.

3) Configure Apache Solr to support auditing from Ranger

Before installing the Apache Ranger admin service we will need to configure Apache Solr. The Apache Ranger admin service ships with a script to make this easier to configure. Edit 'contrib/solr_for_audit_setup/install.properties' with the following properties:
  • SOLR_USER/SOLR_GROUP - the user/group you are running solr as
  • SOLR_INSTALL_FOLDER - Where you have extracted Solr to as per the first tutorial.
  • SOLR_RANGER_HOME - Where to install the Ranger configuration for Solr auditing.
  • SOLR_RANGER_PORT - The port to be used (8983 as per the first tutorial).
  • SOLR_DEPLOYMENT - solrcloud
  • SOLR_HOST_URL - http://localhost:8983
  • SOLR_ZK - localhost:2181
Make sure that the user running Solr has permission to write to the value configured for "SOLR_LOG_FOLDER" (/var/log/solr/ranger_audits). Now in 'contrib/solr_for_audit_setup' run 'sudo -E ./setup.sh'. The Solr configuration is now copied to $SOLR_RANGER_HOME.

4) Start Apache Zookeeper and SolrCloud

Before starting Apache Solr we will need to start Apache Zookeeper. Download Apache Zookeeper and start it on port 2181 via (this step was not required in the previous tutorial as we were launching SolrCloud with an embedded Zookeeper instance):
  • bin/zkServer.sh start
As per the first post, we want to secure access to SolrCloud via Basic Authentication (note that this is only recently fixed in Apache Ranger). So follow the steps in this post to upload the security.json to Zookeeper via:
  • server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd putfile /security.json security.json
Start Solr as follows in the '${SOLR_RANGER_HOME}/ranger_audit_server/scripts' directory:
  • ./add_ranger_audits_conf_to_zk.sh 
  • ./start_solr.sh
Edit 'create_ranger_audits_collection.sh' and change 'curl --negotiate -u :' to 'curl -u "alice:SolrRocks"'. Save it and then run:
  • ./create_ranger_audits_collection.sh
5) Install the Apache Ranger Admin UI

Edit ${rangerhome}/install.properties and make the following changes:
  • Change SQL_CONNECTOR_JAR to point to the MySQL JDBC driver jar that you downloaded above.
  • Set (db_root_user/db_root_password) to (admin/password)
  • Set (db_user/db_password) to (ranger/password)
  • audit_solr_urls: http://localhost:8983/solr/ranger_audits
  • audit_solr_user: alice
  • audit_solr_password: SolrRocks
  • audit_solr_zookeepers: localhost:2181
Now you can run the setup script via "sudo -E ./setup.sh". When this is done then start the Apache Ranger admin service via: "sudo ranger-admin start".

6) Test that auditing is working correctly in the Ranger Admin service

Open a browser and navigate to "http://localhost:6080". Try to log on first using some made up credentials. Then log in using "admin/admin". Click on the "Audit" tab and then select "Login Sessions". You should see the incorrect and the correct login attempts, meaning that ranger is successfully storing and retrieving audit information in Solr:


Monday, June 26, 2017

Securing Apache Solr - part I

This is the first post in a series of articles on securing Apache Solr. In this post we will look at deploying an example SolrCloud instance and securing access to it via basic authentication.

1) Install and deploy a SolrCloud example

Download and extract Apache Solr (6.6.0 was used for the purpose of this tutorial). Now start SolrCloud via:
  • bin/solr -e cloud
Accept all of the default options. This creates a cluster of two nodes, with a collection "gettingstarted" split into two shards and two replicas per-shard. A web interface is available after startup at: http://localhost:8983/solr/.

Once the cluster is up and running we can post some data to the collection we have created via the REST interface:
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book1", "title_t" : "The Merchant of Venice", "author_s" : "William Shakespeare"}]'
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book2", "title_t" : "Macbeth", "author_s" : "William Shakespeare"}]'
  • curl http://localhost:8983/solr/gettingstarted/update -d '[ {"id" : "book3", "title_t" : "Death of a Salesman", "author_s" : "Arthur Miller"}]'
We can search the REST interface to for example return all entries by William Shakespeare as follows:
  • curl http://localhost:8983/solr/gettingstarted/query?q=author_s:William+Shakespeare
2) Authenticating users to our SolrCloud instance

Now that our SolrCloud instance is up and running, let's look at how we can secure access to it, by using HTTP Basic Authentication to authenticate our REST requests. Download the following security configuration which enables Basic Authentication in Solr:
Two users are defined - "alice" and "bob" - both with password "SolrRocks". Now upload this configuration to the Apache Zookeeper instance that is running with Solr:
  • server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:9983 -cmd putfile /security.json security.json
Now try to run the search query above again using Curl. A 401 error will be returned. Once we specify the correct credentials then the request will work as expected, e.g.:
  • curl -u alice:SolrRocks http://localhost:8983/solr/gettingstarted/query?q=author_s:Arthur+Miller

Thursday, June 22, 2017

SSO support for Apache Syncope REST services

Apache Syncope has recently added SSO support for its REST services in the 2.0.3 release. Previously, access to the REST services of Syncope was via HTTP Basic Authentication. From the 2.0.3 release, SSO support is available using JSON Web Tokens (JWT). In this post, we will look at how this works and how it can be configured.

1) Obtaining an SSO token from Apache Syncope

As stated above, in the past it was necessary to supply HTTP Basic Authentication credentials when invoking on the REST API. Let's look at an example using curl. Assume we have a running Apache Syncope instance with a user "alice" with password "ecila". We can make a GET request to the user self service via:
  • curl -u alice:ecila http://localhost:8080/syncope/rest/users/self
It may be inconvenient to supply user credentials on each request or the authentication process might not scale very well if we are authenticating the password to a backend resource. From Apache Syncope 2.0.3, we can instead get an SSO token by sending a POST request to "accessTokens/login" as follows:
  • curl -I -u alice:ecila -X POST http://localhost:8080/syncope/rest/accessTokens/login
The response contains two headers:
  • X-Syncope-Token: A JWT token signed according to the JSON Web Signature (JWS) spec.
  • X-Syncope-Token-Expire: The expiry date of the token
The token in question is signed using the (symmetric) "HS512" algorithm. It contains the subject "alice" and the issuer of the token ("ApacheSyncope"), as well as a random token identifier, and timestamps that indicate when the token was issued, when it expires, and when it should not be accepted before.

The signing key and the issuer name can be changed by editing 'security.properties' and specifying new values for 'jwsKey' and 'jwtIssuer'. Please note that it is critical to change the signing key from the default value! It is also possible to change the signature algorithm from the next 2.0.4 release via a custom 'securityContext.xml' (see here). The default lifetime of the token (120 minutes) can be changed via the "jwt.lifetime.minutes" configuration property for the domain.

2) Using the SSO token to invoke on a REST service

Now that we have an SSO token, we can use it to invoke on a REST service instead of specifying our username and password as before. For Syncope 2.0.3 only, the header name is the same as the header name above "X-Syncope-Token". From Syncope 2.0.4 onwards, the header name is "Authorization: Bearer <token>", e.g.:
  • curl -H "Authorization: Bearer eyJ0e..." http://localhost:8080/syncope/rest/users/self
The signature is first checked on the token, then the issuer is verified so that it matches what is configured, and then the expiry and not-before dates are checked. If the identifier matches that of a saved access token then authentication is successful.

Finally, SSO tokens can be seen in the admin console under "Dashboard/Access Token", where they can be manually revoked by the admin user: