How to Serialize object in hadoop (in HDFS)(如何在 hadoop 中序列化对象(在 HDFS 中))
问题描述
我有一个 HashMap <字符串,数组列表 <整数 > >.我想将我的 HashMap 对象(hmap)序列化到 HDFS 位置,然后在 Mapper 和 Reducers 将其反序列化以使用它.
I have a HashMap < String,ArrayList < Integer > >. I want to serialize my HashMap object(hmap) to HDFS location and later deserialize it at Mapper and Reducers for using it.
为了在 HDFS 上序列化我的 HashMap 对象,我使用了如下的普通 java 对象序列化代码,但出现错误(权限被拒绝)
To serialize my HashMap object on HDFS I used normal java object serialization code as follows but got error (permission denied)
try
{
FileOutputStream fileOut =new FileOutputStream("hashmap.ser");
ObjectOutputStream out = new ObjectOutputStream(fileOut);
out.writeObject(hm);
out.close();
}
catch(Exception e)
{
e.printStackTrace();
}
我遇到了以下异常
java.io.FileNotFoundException: hashmap.ser (Permission denied)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:221)
at java.io.FileOutputStream.<init>(FileOutputStream.java:110)
at KMerIndex.createIndex(KMerIndex.java:121)
at MyDriverClass.formRefIndex(MyDriverClass.java:717)
at MyDriverClass.main(MyDriverClass.java:768)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
有人可以建议或分享如何在 hdfs 上的 hadoop 中序列化对象的示例代码吗?
Can someone please suggest or share the sample code of how to serialize object in hadoop on hdfs ?
推荐答案
请尝试使用 SerializationUtils 来自 Apache Commons Lang.
Please try using SerializationUtils from Apache Commons Lang.
下面是方法
static Object clone(Serializable object) //Deep clone an Object using serialization.
static Object deserialize(byte[] objectData) //Deserializes a single Object from an array of bytes.
static Object deserialize(InputStream inputStream) //Deserializes an Object from the specified stream.
static byte[] serialize(Serializable obj) //Serializes an Object to a byte array for storage/serialization.
static void serialize(Serializable obj, OutputStream outputStream) //Serializes an Object to the specified stream.
在存储到 HDFS 时,您可以存储从序列化返回的 byte[]
.在获取对象时,您可以将类型转换为相应的对象,例如:文件对象并可以将其取回.
While storing in to HDFS you can store byte[]
which was returned from serialize.
While getting the Object you can type cast to corresponding object for ex: File object and can get it back.
在我的例子中,我在 Hbase 列中存储了一个哈希图,我在我的映射器方法中将它检索回来,作为 Hashmap .. 并且成功了.强>
当然,你也可以用同样的方法...
Surely, you can also do that in the same way...
另一件事是你也可以使用 Apache Commons IO 参考这个 (org.apache.commons.io.FileUtils
);但稍后您需要将此文件复制到 HDFS.因为您希望 HDFS 作为数据存储.
Another thing is You can also Use Apache Commons IO refer this (org.apache.commons.io.FileUtils
);
but later you need to copy this file to HDFS. since you wanted HDFS as datastore.
FileUtils.writeByteArrayToFile(new File("pathname"), myByteArray);
注意: jar apache commons io 和 apache commons lang 在 hadoop 集群中始终可用.
Note : Both jars apache commons io and apache commons lang are always available in hadoop cluster.
这篇关于如何在 hadoop 中序列化对象(在 HDFS 中)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何在 hadoop 中序列化对象(在 HDFS 中)
基础教程推荐
- FirebaseListAdapter 不推送聊天应用程序的单个项目 - Firebase-Ui 3.1 2022-01-01
- Java:带有char数组的println给出乱码 2022-01-01
- 在 Libgdx 中处理屏幕的正确方法 2022-01-01
- 无法使用修饰符“public final"访问 java.util.Ha 2022-01-01
- “未找到匹配项"使用 matcher 的 group 方法时 2022-01-01
- 降序排序:Java Map 2022-01-01
- Java Keytool 导入证书后出错,"keytool error: java.io.FileNotFoundException &拒绝访问" 2022-01-01
- 如何使用 Java 创建 X509 证书? 2022-01-01
- 设置 bean 时出现 Nullpointerexception 2022-01-01
- 减少 JVM 暂停时间 >1 秒使用 UseConcMarkSweepGC 2022-01-01