Files.lines to skip broken lines in Java8(Files.lines 在 Java8 中跳过断行)
问题描述
我正在使用 Files.lines(...) 读取一个非常大 (500mb) 的文件.它读取文件的一部分,但在某些时候它会因 java.io.UncheckedIOException: java.nio.charset.MalformedInputException: Input length = 1
I am reading a very large (500mb) file with Files.lines(...). It reads a part of the file but at some point it breaks with java.io.UncheckedIOException: java.nio.charset.MalformedInputException: Input length = 1
我认为该文件有不同字符集的行.有没有办法跳过这些断线?我知道返回的流由 Reader 支持,并且我知道如何跳过 Reader,但不知道如何从流中获取 Reader 以按照我的喜好进行设置.
I think the file has lines with different charsets. Is there a way to skip these broken lines? I know that the stream returned is backed by a Reader and with the reader I know how to skip, but don't know how to get the Reader from the stream to set it up as I like.
List<String> lines = new ArrayList<>();
try (Stream<String> stream = Files.lines(Paths.get(getClass().getClassLoader().getResource("bigtest.txt").toURI()), Charset.forName("UTF-8"))) {
stream
.filter(s -> s.substring(0, 2).equalsIgnoreCase("aa"))
.forEach(lines::add);
} catch (final IOException e) {
// catch
}
推荐答案
当预配置的解码器已经异常停止解码时,您不能在解码后过滤具有无效字符的行.您必须手动配置 CharsetDecoder
以告诉它忽略无效输入或用特殊字符替换该输入.
You can’t filter lines with invalid characters after the decoding when the preconfigured decoder already stops the decoding with an exception. You have to configure a CharsetDecoder
manually to tell it to ignore invalid input or replace that input with a special character.
CharsetDecoder dec=StandardCharsets.UTF_8.newDecoder()
.onMalformedInput(CodingErrorAction.IGNORE);
Path path=Paths.get(getClass().getClassLoader().getResource("bigtest.txt").toURI());
List<String> lines;
try(Reader r=Channels.newReader(FileChannel.open(path), dec, -1);
BufferedReader br=new BufferedReader(r)) {
lines=br.lines()
.filter(s -> s.regionMatches(true, 0, "aa", 0, 2))
.collect(Collectors.toList());
}
这只是忽略字符集解码错误,跳过字符.要跳过包含错误的整行,您可以让解码器为错误插入替换字符(默认为 'ufffd'
)并过滤掉包含该字符的行:
This simply ignores charset decoding errors, skipping the characters. To skip entire lines containing errors, you can let the decoder insert a replacement character (defaults to 'ufffd'
) for errors and filter out lines containing that character:
CharsetDecoder dec=StandardCharsets.UTF_8.newDecoder()
.onMalformedInput(CodingErrorAction.REPLACE);
Path path=Paths.get(getClass().getClassLoader().getResource("bigtest.txt").toURI());
List<String> lines;
try(Reader r=Channels.newReader(FileChannel.open(path), dec, -1);
BufferedReader br=new BufferedReader(r)) {
lines=br.lines()
.filter(s->!s.contains(dec.replacement()))
.filter(s -> s.regionMatches(true, 0, "aa", 0, 2))
.collect(Collectors.toList());
}
这篇关于Files.lines 在 Java8 中跳过断行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Files.lines 在 Java8 中跳过断行
基础教程推荐
- Java 中保存最后 N 个元素的大小受限队列 2022-01-01
- 如何在不安装整个 WTP 包的情况下将 Tomcat 8 添加到 Eclipse Kepler 2022-01-01
- 在螺旋中写一个字符串 2022-01-01
- Spring Boot Freemarker从2.2.0升级失败 2022-01-01
- 如何使用 Eclipse 检查调试符号状态? 2022-01-01
- 如何使用 Stream 在集合中拆分奇数和偶数以及两者的总和 2022-01-01
- 如何强制对超级方法进行多态调用? 2022-01-01
- 首次使用 Hadoop,MapReduce Job 不运行 Reduce Phase 2022-01-01
- 由于对所需库 rt.jar 的限制,对类的访问限制? 2022-01-01
- 如何对 HashSet 进行排序? 2022-01-01