Univocity - How to return one bean per row using iterator style?(Univocity - 如何使用迭代器样式每行返回一个 bean?)
问题描述
我正在构建一个合并几个大的排序 csv 文件的过程.我目前正在研究使用 Univocity 来做到这一点.我设置合并的方式是使用实现类似接口的 bean.
I am building a process to merge a few big sorted csv files. I am currently looking into using Univocity to do this. The way I setup the merge is to use beans that implement comparable interface.
简化后的文件如下所示:
The simplified file looks like this:
id,data
1,aa
2,bb
3,cc
bean 看起来像这样(省略了 getter 和 setter):
The bean looks like this (getters and setters ommited):
public class Address implements Comparable<Address> {
@Parsed
private int id;
@Parsed
private String data;
@Override
public int compareTo(Address o) {
return Integer.compare(this.getId(), o.getId());
}
}
比较器如下所示:
public class AddressComparator implements Comparator<Address>{
@Override
public int compare(Address a, Address b) {
if (a == null)
throw new IllegalArgumentException("argument object a cannot be null");
if (b == null)
throw new IllegalArgumentException("argument object b cannot be null");
return Integer.compare(a.getId(), b.getId());
}
}
由于我不想读取内存中的所有数据,我想读取每个文件的顶部记录并执行一些比较逻辑.这是我的简化示例:
As I do not want to read all the data in memory, I want to read the top record of each file and execute some compare logic. Here is my simplified example:
public class App {
private static final String INPUT_1 = "src/test/input/address1.csv";
private static final String INPUT_2 = "src/test/input/address2.csv";
private static final String INPUT_3 = "src/test/input/address3.csv";
public static void main(String[] args) throws FileNotFoundException {
BeanListProcessor<Address> rowProcessor = new BeanListProcessor<Address>(Address.class);
CsvParserSettings parserSettings = new CsvParserSettings();
parserSettings.setRowProcessor(rowProcessor);
parserSettings.setHeaderExtractionEnabled(true);
CsvParser parser = new CsvParser(parserSettings);
List<FileReader> readers = new ArrayList<>();
readers.add(new FileReader(new File(INPUT_1)));
readers.add(new FileReader(new File(INPUT_2)));
readers.add(new FileReader(new File(INPUT_3)));
// This parses all rows, but I am only interested in getting 1 row as a bean.
for (FileReader fileReader : readers) {
parser.parse(fileReader);
List<Address> beans = rowProcessor.getBeans();
for (Address address : beans) {
System.out.println(address.toString());
}
}
// want to have a map with the reader and the first bean object
// Map<FileReader, Address> topRecordofReader = new HashMap<>();
Map<FileReader, String[]> topRecordofReader = new HashMap<>();
for (FileReader reader : readers) {
parser.beginParsing(reader);
String[] row;
while ((row = parser.parseNext()) != null) {
System.out.println(row[0]);
System.out.println(row[1]);
topRecordofReader.put(reader, row);
// all done, only want to get first row
break;
}
}
}
}
问题
鉴于上面的例子,我如何解析它迭代每一行并每行返回一个 bean,而不是解析整个文件?
Question
Given above example, how do I parse in such a way that it iterates over each row and returns a bean per row, instead of parsing the whole file?
我正在寻找这样的东西(这个不起作用的代码只是为了表明我正在寻找的解决方案):
I am looking for something like this (this not working code is just to indicate the kind of solution I am looking for):
for (FileReader fileReader : readers) {
parser.beginParsing(fileReader);
Address bean = null;
while (bean = parser.parseNextRecord() != null) {
topRecordofReader.put(fileReader, bean);
}
}
推荐答案
有两种方法可以迭代读取而不是将所有内容加载到内存中,第一种是使用 BeanProcessor
而不是 BeanListProcessor
:
There are two approaches to read iteratively instead of loading everything in memory, the first one is to use a BeanProcessor
instead of BeanListProcessor
:
settings.setRowProcessor(new BeanProcessor<Address>(Address.class) {
@Override
public void beanProcessed(Address address, ParsingContext context) {
// your code to process the each parsed object here!
}
为了在没有回调的情况下迭代读取 bean(并执行一些其他常见过程),我们创建了一个 CsvRoutines 类(从 AbstractRoutines - 更多示例 这里):
To read beans iteratively without a callback (and to perform some other common processes), we created a CsvRoutines class (which extends from AbstractRoutines - more examples here):
File input = new File("/path/to/your.csv")
CsvParserSettings parserSettings = new CsvParserSettings();
//...configure the parser
// You can also use TSV and Fixed-width routines
CsvRoutines routines = new CsvRoutines(parserSettings);
for (Address address : routines.iterate(Address.class, input, "UTF-8")) {
//process your bean
}
希望这会有所帮助!
这篇关于Univocity - 如何使用迭代器样式每行返回一个 bean?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:Univocity - 如何使用迭代器样式每行返回一个 bean?
基础教程推荐
- Spring Boot Freemarker从2.2.0升级失败 2022-01-01
- 如何强制对超级方法进行多态调用? 2022-01-01
- 在螺旋中写一个字符串 2022-01-01
- 如何使用 Stream 在集合中拆分奇数和偶数以及两者的总和 2022-01-01
- 首次使用 Hadoop,MapReduce Job 不运行 Reduce Phase 2022-01-01
- 如何对 HashSet 进行排序? 2022-01-01
- 由于对所需库 rt.jar 的限制,对类的访问限制? 2022-01-01
- Java 中保存最后 N 个元素的大小受限队列 2022-01-01
- 如何在不安装整个 WTP 包的情况下将 Tomcat 8 添加到 Eclipse Kepler 2022-01-01
- 如何使用 Eclipse 检查调试符号状态? 2022-01-01