Note that we will only use Sqoop 2 (current version 1.99.7), as this is the only version that both Sentry and Ranger support. However, this version is not (yet) recommended for production deployment.
1) Set up Apache Hadoop and Apache Kafka
First we will set up Apache Hadoop and Apache Kafka. The use-case is that we want to transfer a file from HDFS (/data/LICENSE.txt) to a Kafka topic (test). Follow part (1) of an earlier tutorial I wrote about installing Apache Hadoop. The following change is also required for ''etc/hadoop/core-site.xml' (in addition to the "fs.defaultFS" setting that is configured in the earlier tutorial):
Make sure that LICENSE.txt is uploaded to the /data directory as outlined in the tutorial. Now we will set up Apache Kafka. Download Apache Kafka and extract it (1.0.0 was used for the purposes of this tutorial). Start Zookeeper with:
- bin/zookeeper-server-start.sh config/zookeeper.properties
- bin/kafka-server-start.sh config/server.properties
- bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic test
- bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning --consumer.config config/consumer.properties
Download Apache Sqoop and extract it (1.99.7 was used for the purposes of this tutorial).
2.a) Configure + start Sqoop
Before starting Sqoop, edit 'conf/sqoop.properties' and change the following property to point instead to the Hadoop configuration directory (e.g. /path.to.hadoop/etc/hadoop):
- org.apache.sqoop.submission.engine.mapreduce.configuration.directory
- export HADOOP_HOME=path to Hadoop home
- bin/sqoop2-tool upgrade
- bin/sqoop2-tool verify
- bin/sqoop2-server start (stop)
Now that Sqoop has started we need to configure it to transfer data from HDFS to Kafka. Start the Shell via:
- bin/sqoop2-shell
- create link -connector hdfs-connector
- Name: HDFS
- URI: hdfs://localhost:9000
- Conf directory: Path to Hadoop conf directory
- create link -connector kafka-connector
- Name: KAFKA
- Kafka brokers: localhost:9092
- Zookeeper quorum: localhost:2181
- create job -f HDFS -t KAFKA
- Name: testjob
- Input Directory: /data
- Topic: test
- start job -name testjob
No comments:
Post a Comment