Decompress and read Dukascopy .bi5 tick files(解压并读取 Dukascopy .bi5 刻度文件)
问题描述
我需要打开一个 .bi5 文件并阅读其内容以长话短说.问题:我有数以万计的 .bi5 文件,其中包含我需要解压缩和处理(读取、转储到 pandas)的时间序列数据.
I need to open a .bi5 file and read the contents to cut a long story short. The problem: I have tens of thousands of .bi5 files containing time-series data that I need to decompress and process (read, dump into pandas).
我最终为 lzma 库安装了 Python 3(我通常使用 2.7),因为我遇到了使用 Python 2.7 的 lzma 后向端口编译的噩梦,所以我承认并使用 Python 3 运行,但没有成功.问题多得不胜枚举,长问题没人看!
I ended up installing Python 3 (I use 2.7 normally) specifically for the lzma library, as I ran into compiling nightmares using the lzma back-ports for Python 2.7, so I conceded and ran with Python 3, but with no success. The problems are too numerous to divulge, no one reads long questions!
我已经包含了其中一个 .bi5 文件,如果有人能够设法将其放入 Pandas Dataframe 并向我展示他们是如何做到的,那将是理想的.
I have included one of the .bi5 files, if someone could manage to get it into a Pandas Dataframe and show me how they did it, that would be ideal.
ps这个文件只有几kb,它会在一秒钟内下载.首先十分感谢.
ps the fie is only a few kb, it will download in a second. Thanks very much in advance.
(文件)http://www.filedropper.com/13hticks
推荐答案
下面的代码应该可以解决问题.首先,它打开一个文件并在 lzma 中对其进行解码,然后使用 struct 解压二进制数据.
The code below should do the trick. First, it opens a file and decodes it in lzma and then uses struct to unpack the binary data.
import lzma
import struct
import pandas as pd
def bi5_to_df(filename, fmt):
    chunk_size = struct.calcsize(fmt)
    data = []
    with lzma.open(filename) as f:
        while True:
            chunk = f.read(chunk_size)
            if chunk:
                data.append(struct.unpack(fmt, chunk))
            else:
                break
    df = pd.DataFrame(data)
    return df
最重要的是要知道正确的格式.我四处搜索并尝试猜测,'>3i2f'(或 >3I2f)效果很好.(这是大端 3 个整数 2 个浮点数.您的建议:'i4f' 不会产生合理的浮点数 - 无论是大端还是小端.)对于 struct 和格式语法请参阅 文档.
The most important thing is to know the right format. I googled around and tried to guess and '>3i2f' (or >3I2f) works quite good. (It's big endian 3 ints 2 floats. What you suggest: 'i4f' doesn't produce sensible floats - regardless whether big or little endian.) For struct and format syntax see the docs.
df = bi5_to_df('13h_ticks.bi5', '>3i2f')
df.head()
Out[177]: 
      0       1       2     3     4
0   210  110218  110216  1.87  1.12
1   362  110219  110216  1.00  5.85
2   875  110220  110217  1.00  1.12
3  1408  110220  110218  1.50  1.00
4  1884  110221  110219  3.94  1.00
<小时>
更新
将 bi5_to_df 的输出与 https://github.com/ninety47/进行比较杜高斯贝,我从那里编译并运行 test_read_bi5 .输出的第一行是:
To compare the output of bi5_to_df with https://github.com/ninety47/dukascopy,
I compiled and run test_read_bi5 from there. The first lines of the output are:
time, bid, bid_vol, ask, ask_vol
2012-Dec-03 01:00:03.581000, 131.945, 1.5, 131.966, 1.5
2012-Dec-03 01:00:05.142000, 131.943, 1.5, 131.964, 1.5
2012-Dec-03 01:00:05.202000, 131.943, 1.5, 131.964, 2.25
2012-Dec-03 01:00:05.321000, 131.944, 1.5, 131.964, 1.5
2012-Dec-03 01:00:05.441000, 131.944, 1.5, 131.964, 1.5
和 bi5_to_df 在同一个输入文件上给出:
And bi5_to_df on the same input file gives:
bi5_to_df('01h_ticks.bi5', '>3I2f').head()
Out[295]: 
      0       1       2     3    4
0  3581  131966  131945  1.50  1.5
1  5142  131964  131943  1.50  1.5
2  5202  131964  131943  2.25  1.5
3  5321  131964  131944  1.50  1.5
4  5441  131964  131944  1.50  1.5
所以一切似乎都很好(ninety47 的代码重新排列了列).
So everything seems to be fine (ninety47's code reorders columns).
另外,使用 '>3I2f' 而不是 '>3i2f' 可能更准确(即 unsigned int 而不是int).
Also, it's probably more accurate to use '>3I2f' instead of '>3i2f' (i.e. unsigned int instead of int).
这篇关于解压并读取 Dukascopy .bi5 刻度文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:解压并读取 Dukascopy .bi5 刻度文件
 
				
         
 
            
        基础教程推荐
- 修改列表中的数据帧不起作用 2022-01-01
- 求两个直方图的卷积 2022-01-01
- Plotly:如何设置绘图图形的样式,使其不显示缺失日期的间隙? 2022-01-01
- 在Python中从Azure BLOB存储中读取文件 2022-01-01
- 包装空间模型 2022-01-01
- PANDA VALUE_COUNTS包含GROUP BY之前的所有值 2022-01-01
- 使用大型矩阵时禁止 Pycharm 输出中的自动换行符 2022-01-01
- 在同一图形上绘制Bokeh的烛台和音量条 2022-01-01
- PermissionError: pip 从 8.1.1 升级到 8.1.2 2022-01-01
- 无法导入 Pytorch [WinError 126] 找不到指定的模块 2022-01-01
 
    	 
    	 
    	 
    	 
    	 
    	 
    	 
    	 
				 
				 
				 
				