Simplest way to read a CSV file mapped to memory?(读取映射到内存的 CSV 文件的最简单方法?)
问题描述
当我在 C++(11) 中读取文件时,我使用以下方法将它们映射到内存中:
When I read from files in C++(11) I map them in to memory using:
boost::interprocess::file_mapping* fm = new file_mapping(path, boost::interprocess::read_only);
boost::interprocess::mapped_region* region = new mapped_region(*fm, boost::interprocess::read_only);
char* bytes = static_cast<char*>(region->get_address());
当我希望以极快的速度逐字节读取时,这很好.但是,我创建了一个 csv 文件,我想将其映射到内存,读取每一行并在逗号上拆分每一行.
Which is fine when I wish to read byte by byte extremely fast. However, I have created a csv file which I would like to map to memory, read each line and split each line on the comma.
有没有办法通过对上面的代码进行一些修改来做到这一点?
Is there a way I can do this with a few modifications of my above code?
(我映射到内存是因为我有大量内存,我不希望磁盘/IO 流传输出现任何瓶颈).
(I am mapping to memory because I have an awful lot of memory and I do not want any bottleneck with disk/IO streaming).
推荐答案
这是我对足够快"的看法.它在大约 1 秒内压缩 116 MiB 的 CSV(2.5Mio 行[1]).
Here's my take on "fast enough". It zips through 116 MiB of CSV (2.5Mio lines[1]) in ~1 second.
然后可以零拷贝随机访问结果,因此没有开销(除非页面被换出).
The result is then randomly accessible at zero-copy, so no overhead (unless pages are swapped out).
比较:
- 这比简单的
wc csv.txt
处理同一个文件快 ~3 倍 它大约与下面的 perl one liner 一样快(它列出了所有行上的不同字段计数):
- that's ~3x faster than a naive
wc csv.txt
takes on the same file it's about as fast as the following perl one liner (which lists the distinct field counts on all lines):
perl -ne '$fields{scalar split /,
本文标题为:读取映射到内存的 CSV 文件的最简单方法?
基础教程推荐
- 管理共享内存应该分配多少内存?(助推) 2022-12-07
- 使用从字符串中提取的参数调用函数 2022-01-01
- 如何“在 Finder 中显示"或“在资源管理器中显 2021-01-01
- Windows Media Foundation 录制音频 2021-01-01
- 如何使图像调整大小以在 Qt 中缩放? 2021-01-01
- 为 C/C++ 中的项目的 makefile 生成依赖项 2022-01-01
- 如何在不破坏 vtbl 的情况下做相当于 memset(this, ...) 的操作? 2022-01-01
- 在 C++ 中循环遍历所有 Lua 全局变量 2021-01-01
- 为什么语句不能出现在命名空间范围内? 2021-01-01
- 从 std::cin 读取密码 2021-01-01