将外部 jar 设置为 hadoop 类路径

Setting external jars to hadoop classpath(将外部 jar 设置为 hadoop 类路径)

本文介绍了将外部 jar 设置为 hadoop 类路径的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将外部 jar 设置为 hadoop 类路径,但到目前为止还没有运气.

I am trying to set external jars to hadoop classpath but no luck so far.

我有以下设置

$ hadoop 版本
Hadoop 2.0.6-alpha颠覆 https://git-wip-us.apache.org/repos/asf/bigtop.git -r ca4c88898f95aaab3fd85b5e9c194ffd647c2109詹金斯于 2013-10-31T07:55Z 编译来自带有校验和 95e88b2a9589fa69d6d5c1dbd48d4e 的源该命令是使用/usr/lib/hadoop/hadoop-common-2.0.6-alpha.jar 运行的

$ hadoop version
Hadoop 2.0.6-alpha Subversion https://git-wip-us.apache.org/repos/asf/bigtop.git -r ca4c88898f95aaab3fd85b5e9c194ffd647c2109 Compiled by jenkins on 2013-10-31T07:55Z From source with checksum 95e88b2a9589fa69d6d5c1dbd48d4e This command was run using /usr/lib/hadoop/hadoop-common-2.0.6-alpha.jar

类路径

$ echo $HADOOP_CLASSPATH
/home/tom/workspace/libs/opencsv-2.3.jar

$ echo $HADOOP_CLASSPATH
/home/tom/workspace/libs/opencsv-2.3.jar

我可以看到上面的 HADOOP_CLASSPATH 已经被 hadoop 获取了

I am able see the above HADOOP_CLASSPATH has been picked up by hadoop

$ hadoop 类路径
/etc/hadoop/conf:/usr/lib/hadoop/lib/:/usr/lib/hadoop/.//:/home/tom/workspace/libs/opencsv-2.3.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/:/usr/lib/hadoop-hdfs/.//:/usr/lib/hadoop-yarn/lib/:/usr/lib/hadoop-yarn/.//:/usr/lib/hadoop-mapreduce/lib/:/usr/lib/hadoop-mapreduce/.//

$ hadoop classpath
/etc/hadoop/conf:/usr/lib/hadoop/lib/:/usr/lib/hadoop/.//:/home/tom/workspace/libs/opencsv-2.3.jar:/usr/lib/hadoop-hdfs/./:/usr/lib/hadoop-hdfs/lib/:/usr/lib/hadoop-hdfs/.//:/usr/lib/hadoop-yarn/lib/:/usr/lib/hadoop-yarn/.//:/usr/lib/hadoop-mapreduce/lib/:/usr/lib/hadoop-mapreduce/.//

命令

$ sudo hadoop jar FlightsByCarrier.jar FlightsByCarrier/user/root/1987.csv/user/root/result

$ sudo hadoop jar FlightsByCarrier.jar FlightsByCarrier /user/root/1987.csv /user/root/result

我也尝试了 -libjars 选项

I tried with -libjars option as well

$ sudo hadoop jar FlightsByCarrier.jar FlightsByCarrier/user/root/1987.csv/user/root/result -libjars/home/tom/workspace/libs/opencsv-2.3.jar

$ sudo hadoop jar FlightsByCarrier.jar FlightsByCarrier /user/root/1987.csv /user/root/result -libjars /home/tom/workspace/libs/opencsv-2.3.jar

堆栈跟踪

14/11/04 16:43:23 信息 mapreduce.Job:正在运行的作业:job_1415115532989_00012004 年 14 月 11 日 16:43:55 信息 mapreduce.Job:作业 job_1415115532989_0001 在 uber 模式下运行:false14/11/04 16:43:56 INFO mapreduce.Job: map 0% reduce 0%14/11/04 16:45:27 INFO mapreduce.Job: map 50% reduce 0%2004 年 14 月 11 日 16:45:27 信息 mapreduce.Job:任务 ID:尝试_1415115532989_0001_m_000001_0,状态:失败错误:java.lang.ClassNotFoundException:au.com.bytecode.opencsv.CSVParser在 java.net.URLClassLoader$1.run(URLClassLoader.java:366)在 java.net.URLClassLoader$1.run(URLClassLoader.java:355)在 java.security.AccessController.doPrivileged(本机方法)在 java.net.URLClassLoader.findClass(URLClassLoader.java:354)在 java.lang.ClassLoader.loadClass(ClassLoader.java:425)在 sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)在 java.lang.ClassLoader.loadClass(ClassLoader.java:358)在 FlightsByCarrierMapper.map(FlightsByCarrierMapper.java:19)在 FlightsByCarrierMapper.map(FlightsByCarrierMapper.java:10)在 org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)在 org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757)在 org.apache.hadoop.mapred.MapTask.run(MapTask.java:339)在 org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158)在 java.security.AccessController.doPrivileged(本机方法)在 javax.security.auth.Subject.doAs(Subject.java:415)在 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478)在 org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)

14/11/04 16:43:23 INFO mapreduce.Job: Running job: job_1415115532989_0001 14/11/04 16:43:55 INFO mapreduce.Job: Job job_1415115532989_0001 running in uber mode : false 14/11/04 16:43:56 INFO mapreduce.Job: map 0% reduce 0% 14/11/04 16:45:27 INFO mapreduce.Job: map 50% reduce 0% 14/11/04 16:45:27 INFO mapreduce.Job: Task Id : attempt_1415115532989_0001_m_000001_0, Status : FAILED Error: java.lang.ClassNotFoundException: au.com.bytecode.opencsv.CSVParser at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at FlightsByCarrierMapper.map(FlightsByCarrierMapper.java:19) at FlightsByCarrierMapper.map(FlightsByCarrierMapper.java:10) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:757) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:339) at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:158) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:415) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1478) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:153)

非常感谢任何帮助.

推荐答案

运行地图的节点上缺少您的外部 jar.您必须将其添加到缓存中以使其可用.试试看:

Your external jar is missing on the node running maps. You have to add it to the cache to make it available. Try :

DistributedCache.addFileToClassPath(new Path("pathToJar"), conf);

不确定 DistributedCache 在哪个版本中被弃用,但从 Hadoop 2.2.0 开始,您可以使用:

Not sure in which version DistributedCache was deprecated, but from Hadoop 2.2.0 onward you can use :

job.addFileToClassPath(new Path("pathToJar")); 

这篇关于将外部 jar 设置为 hadoop 类路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:将外部 jar 设置为 hadoop 类路径

基础教程推荐