How to read multiple image files as input from hdfs in map-reduce?(如何在 map-reduce 中从 hdfs 读取多个图像文件作为输入?)
问题描述
private static String[] testFiles = new String[] {"img01.JPG","img02.JPG","img03.JPG","img04.JPG","img06.JPG","img07.JPG","img05.JPG"};
// private static String testFilespath = "/home/student/Desktop/images";
private static String testFilespath ="hdfs://localhost:54310/user/root/images";
//private static String indexpath = "/home/student/Desktop/indexDemo";
private static String testExtensive="/home/student/Desktop/images";
public static class MapClass extends MapReduceBase
implements Mapper<Text, Text, Text, Text> {
private Text input_image = new Text();
private Text input_vector = new Text();
@Override
public void map(Text key, Text value,OutputCollector<Text, Text> output,Reporter reporter) throws IOException {
System.out.println("CorrelogramIndex Method:");
String featureString;
int MAXIMUM_DISTANCE = 16;
AutoColorCorrelogram.Mode mode = AutoColorCorrelogram.Mode.FullNeighbourhood;
for (String identifier : testFiles) {
try (FileInputStream fis = new FileInputStream(testFilespath + "/" + identifier)) {
//Document doc = builder.createDocument(fis, identifier);
//FileInputStream imageStream = new FileInputStream(testFilespath + "/" + identifier);
BufferedImage bimg = ImageIO.read(fis);
AutoColorCorrelogram vd = new AutoColorCorrelogram(MAXIMUM_DISTANCE, mode);
vd.extract(bimg);
featureString = vd.getStringRepresentation();
double[] bytearray=vd.getDoubleHistogram();
System.out.println("image: "+ identifier + " " + featureString );
}
System.out.println(" ------------- ");
input_image.set(identifier);
input_vector.set(featureString);
output.collect(input_image, input_vector);
}
}
}
public static class Reduce extends MapReduceBase
implements Reducer<Text, Text, Text, Text> {
@Override
public void reduce(Text key, Iterator<Text> values,
OutputCollector<Text, Text> output,
Reporter reporter) throws IOException {
String out_vector="";
while (values.hasNext()) {
out_vector.concat(values.next().toString());
}
output.collect(key, new Text(out_vector));
}
}
static int printUsage() {
System.out.println("image_mapreduce [-m <maps>] [-r <reduces>] <input> <output>");
ToolRunner.printGenericCommandUsage(System.out);
return -1;
}
@Override
public int run(String[] args) throws Exception {
JobConf conf = new JobConf(getConf(), image_mapreduce.class);
conf.setJobName("image_mapreduce");
// the keys are words (strings)
conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
conf.setOutputValueClass(Text.class);
conf.setMapperClass(MapClass.class);
// conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
List<String> other_args = new ArrayList<String>();
for(int i=0; i < args.length; ++i) {
try {
if ("-m".equals(args[i])) {
conf.setNumMapTasks(Integer.parseInt(args[++i]));
} else if ("-r".equals(args[i])) {
conf.setNumReduceTasks(Integer.parseInt(args[++i]));
} else {
other_args.add(args[i]);
}
} catch (NumberFormatException except) {
System.out.println("ERROR: Integer expected instead of " + args[i]);
return printUsage();
} catch (ArrayIndexOutOfBoundsException except) {
System.out.println("ERROR: Required parameter missing from " +
args[i-1]);
return printUsage();
}
}
FileInputFormat.setInputPaths(conf, other_args.get(0));
//FileInputFormat.setInputPaths(conf,new Path("hdfs://localhost:54310/user/root/images"));
FileOutputFormat.setOutputPath(conf, new Path(other_args.get(1)));
JobClient.runJob(conf);
return 0;
}
public static void main(String[] args) throws Exception {
int res = ToolRunner.run(new Configuration(), new image_mapreduce(), args);
System.exit(res);
}
}
`我正在编写一个程序,它将多个图像文件作为输入,存储在 hdfs &提取地图功能中的特征.如何指定在 FileInputStream(一些参数)中读取图像的路径?或者有什么方法可以读取多个图像文件?
`I am writing a program which takes multiple image files as input , stored in hdfs & extract the features in map function. How can I specify the path to read the image in FileInputStream(some parameters)? Or is there any way to read the multiple image files?
我想做的是:--以hdfs中的多个图像文件作为输入-- 在地图功能中提取特征.-- 迭代地减少.请帮助我编写代码或更好的方法.
What I want to do is: --Take multiple image files in hdfs as input -- extract features in map function. --reduce itearatively. Please help me in the code or better ways to do it.
推荐答案
考虑使用 HIPI 库 - 它将图像集合存储到 ImageBundle 中(这比将单个图像文件存储在 HDFS 中更有效).他们也有几个例子.
Look into using the HIPI library - it stores a collection of images into an ImageBundle (which is more efficient that storing the individual image files in HDFS). They have a couple of examples too.
至于您的代码,您需要指定您计划使用的输入和输出格式.没有当前的输入格式可以传递整个文件,但是您可以扩展 FileInputFormat 并创建一个 RecordReader 发出
对,其中键是文件名,值是图像文件的字节数.
As for your code, you need to specify what input and output formats you plan to use. There is no current input format that hands the entire file over, but you can just extend FileInputFormat and create a RecordReader that emits <Text, BytesWritable>
pairs, where the key is the filename, and the value is the bytes of the image file.
事实上 Hadoop - The Definitive Guide 提供了这种精确输入格式的示例:
In fact Hadoop - The Definitive Guide has an example of this exact input format:
这篇关于如何在 map-reduce 中从 hdfs 读取多个图像文件作为输入?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何在 map-reduce 中从 hdfs 读取多个图像文件作为输入?
基础教程推荐
- 设置 bean 时出现 Nullpointerexception 2022-01-01
- 无法使用修饰符“public final"访问 java.util.Ha 2022-01-01
- 如何使用 Java 创建 X509 证书? 2022-01-01
- Java:带有char数组的println给出乱码 2022-01-01
- Java Keytool 导入证书后出错,"keytool error: java.io.FileNotFoundException &拒绝访问" 2022-01-01
- FirebaseListAdapter 不推送聊天应用程序的单个项目 - Firebase-Ui 3.1 2022-01-01
- 在 Libgdx 中处理屏幕的正确方法 2022-01-01
- “未找到匹配项"使用 matcher 的 group 方法时 2022-01-01
- 降序排序:Java Map 2022-01-01
- 减少 JVM 暂停时间 >1 秒使用 UseConcMarkSweepGC 2022-01-01