Key of object type in the hadoop mapper(hadoop映射器中对象类型的键)
问题描述
hadoop 新手并尝试从 这里.
New to hadoop and trying to understand the mapreduce wordcount example code from here.
文档中的映射器是 -
The mapper from documentation is -
Mapper<KEYIN,VALUEIN,KEYOUT,VALUEOUT>
我看到在mapreduce字数示例中,地图代码如下
I see that in the mapreduce word count example the map code is as follows
public void map(Object key, Text value, Context context)
问题 - 这个 Object 类型的键有什么意义?如果映射器的输入是文本文档,我假设其中的值是 hadoop 已分区并存储在 HDFS 中的文本块(64MB 或 128MB).更一般地说,这个输入键键入地图代码有什么用?
Question - What is the point of this key of type Object? If the input to a mapper is a text document I am assuming the value in would be the chunk of text (64MB or 128MB) that hadoop has partitioned and stored in HDFS. More generally, what is the use of this input key Keyin to the map code?
任何指针将不胜感激
推荐答案
InputFormat 描述了 Map-Reduce 作业的输入规范.默认情况下,hadoop 使用 TextInputFormat
,它继承了 FileInputFormat
,处理输入文件.
InputFormat describes the input-specification for a Map-Reduce job.By default, hadoop uses TextInputFormat
, which inherits FileInputFormat
, to process the input files.
我们还可以指定在客户端或驱动代码中使用的输入格式:
We can also specify the input format to use in the client or driver code:
job.setInputFormatClass(SomeInputFormat.class);
对于 TextInputFormat
,文件被分成几行.键是文件中的位置,值是文本行.
For the TextInputFormat
, files are broken into lines. Keys are the position in the file, and values are the line of text.
在public void map(Object key, Text value, Context context)
中,key是行偏移,value 是实际的文本.
In the public void map(Object key, Text value, Context context)
, key is the line offset and value is the actual text.
请查看 TextInputFormat API https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html
Please look at TextInputFormat API https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/lib/input/TextInputFormat.html
默认情况下,对于 TextInputFormat
,Key 是 LongWritable
类型,值是 Text
类型.在您的示例中,Object 类型在LongWritable
的位置,因为它是兼容的.您也可以使用 LongWritable
类型代替 Object
By default, Key is LongWritable
type and value is of type Text
for the TextInputFormat
.In your example, Object type is specified in the place of LongWritable
as it is compatible. You can also use LongWritable
type in the place of Object
这篇关于hadoop映射器中对象类型的键的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:hadoop映射器中对象类型的键
基础教程推荐
- 设置 bean 时出现 Nullpointerexception 2022-01-01
- 如何使用 Java 创建 X509 证书? 2022-01-01
- 在 Libgdx 中处理屏幕的正确方法 2022-01-01
- FirebaseListAdapter 不推送聊天应用程序的单个项目 - Firebase-Ui 3.1 2022-01-01
- 无法使用修饰符“public final"访问 java.util.Ha 2022-01-01
- “未找到匹配项"使用 matcher 的 group 方法时 2022-01-01
- 减少 JVM 暂停时间 >1 秒使用 UseConcMarkSweepGC 2022-01-01
- Java Keytool 导入证书后出错,"keytool error: java.io.FileNotFoundException &拒绝访问" 2022-01-01
- Java:带有char数组的println给出乱码 2022-01-01
- 降序排序:Java Map 2022-01-01