具体的安装和jdbc的驱动准备你们看第6课。现在我用一个例子讲解sqoop2的具体使用方法
数据准备有一个mysql的表叫worker,里面有三条数据,我们要将其导入hadoop
这是建表语句
create table `workers` ( `id` int(11) not null auto_increment, `name` varchar(20) not null, primary key (`id`) ) engine=myisam default charset=utf8;
插入三条数据
insert into workers (name) values ('jack');insert into workers (name) values ('vicky');insert into workers (name) values ('martin');
接下来我们使用sqoop客户端进行导入工作
导入数据建立数据库连接
$ sqoop2sqoop home directory: /usr/lib/sqoop2sqoop shell: type 'help' or '\h' for help.sqoop:000> create connection --cid 1
这句话的意思是建立一个id为1的连接,然后sqoop会让你输入一些必要参数
creating connection for connector with id 1please fill following values to create new connection objectname: first connectionconfiguration configurationjdbc driver class: com.mysql.jdbc.driverjdbc connection string: jdbc:mysql://mysql.server/databaseusername: sqooppassword: *****jdbc connection properties:there are currently 0 values in the map:entry#security related configuration optionsmax connections: 0new connection was successfully created with validation status fine and persistent id 1
记得把 jdbc:mysql://mysql.server/database 替换成你真实的数据库连接
建立job建立一个id为1的job,类型是 importsqoop:000> create job --xid 1 --type import
接下来sqoop会让你输入需要的参数,只需要输入job的名字和table name就好了,还有几个存储选项都选0,其他直接回车creating job for connection with id 1please fill following values to create new job objectname: first jobdatabase configurationtable name: workerstable sql statement:table column names:partition column name:boundary query:output configurationstorage type: 0 : hdfschoose: 0output format: 0 : text_file 1 : sequence_filechoose: 0compression format: 0 : none 1 : default 2 : deflate 3 : gzip 4 : bzip2 5 : lzo 6 : lz4 7 : snappychoose: 0output directory: /user/jarcec/usersnew job was successfully created with validation status fine and persistent id 1
执行任务用start job命令去执行这个任务,用--jid来传入任务idsqoop:000> start job --jid 1submission detailsjob id: 1server url: http://localhost:12000/sqoop/created by: rootcreation date: 2014-11-26 16:41:30 cstlastly updated by: rootexternal id: job_1406097234796_0006 n/a2014-11-26 16:41:30 cst: booting - progress is not available
检查结果再打开一个ssh终端,然后用hdfs的命令查看结果$ hdfs dfs -ls /user/jarcec/workers/found 3 items-rw-r--r-- 2 sqoop2 supergroup 0 2014-11-26 16:42 /user/jarcec/workers/_success-rw-r--r-- 2 sqoop2 supergroup 9 2014-11-26 16:41 /user/jarcec/workers/part-m-00000-rw-r--r-- 2 sqoop2 supergroup 21 2014-11-26 16:42 /user/jarcec/workers/part-m-00001
可以看到有三个结果文件被生成,然后我们cat看下文件的内容
$ hdfs dfs -cat /user/jarcec/workers/part-m-000001,'jack'$ hdfs dfs -cat /user/jarcec/workers/part-m-000012,'vicky'3,'martin'
今天写到这里,下节课讲讲导出
