ndarray 比 recarray 访问快吗?

2023-09-27Python开发问题
2

本文介绍了ndarray 比 recarray 访问快吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着跟版网的小编来一起学习吧!

问题描述

我能够将我的 recarray 数据复制到 ndarray,进行一些计算并返回带有更新值的 ndarray.

I was able to copy my recarray data to a ndarray, do some calculations and return the ndarray with updated values.

然后,我在 numpy.lib.recfunctions 中发现了 append_fields() 功能,并认为将 2 个字段简单地附加到我原来的 recarray 会更聪明保存我的计算值.

Then, I discovered the append_fields() capability in numpy.lib.recfunctions, and thought it would be a lot smarter to simply append 2 fields to my original recarray to hold my calculated values.

当我这样做时,我发现操作要慢得多.我不需要计时,基于 ndarray 的过程需要几秒钟,而使用 recarray 需要一分钟以上,而且我的测试数组很小,<10,000 行.

When I did this, I found the operation was much, much slower. I didn't have to time it, the ndarray based process takes a few seconds compared to a minute+ with recarray and my test arrays are small, <10,000 rows.

这是典型的吗?ndarray 访问比 recarray 快得多?我预计会由于按字段名称访问而导致性能下降,但不会这么严重.

Is this typical? ndarray access is much faster than recarray? I expected some performance degradation due to access by field name, but not this much.

推荐答案

2018 年 11 月 15 日更新
我扩展了我的时序测试,以阐明 ndarray、结构化数组、recarray 和掩码数组(记录数组的类型?)的性能差异.每个都有细微的差别.请参阅此处的讨论:
numpy-discussion:structured-arrays-recarrays-and-record-arrays

这是我的性能测试结果.我构建了一个非常简单的示例(使用我的 HDF5 数据集之一)来比较存储在 4 种类型数组中的相同数据的性能:ndarray、结构化数组、recarray 和掩码数组.在构造数组之后,它们被传递给一个函数,该函数简单地遍历每一行并从每一行中提取 12 个值.这些函数从 timeit 函数调用一次(数字=1).该测试只测量数组读取函数,并避免所有其他计算.
下面给出了 9,000 行的结果:

Here are result of my performance tests. I built a very simple example (using 1 of my HDF5 data sets) to compare performance with the same data stored in 4 types of arrays: ndarray, structured array, recarray and masked array. After the arrays are constructed, they are passed to a function that simply loops thru each row and extracts 12 values from each row. The functions are called from the timeit function with a single pass (number=1). This test only measures the array read function, and avoids all other calculations.
Results given below for 9,000 rows:

for ndarray: 0.034137165047070615
for structured array: 0.1306827116913577
for recarray: 0.446010040784266
for masked array: 31.33269560998199

根据此测试,访问性能随每种类型而降低.结构化数组和 recarray 的访问时间比 ndarray 访问慢 4 到 13 倍(但都只有几分之一秒).但是,ndarray 访问比掩码数组访问快 1000 倍.这解释了我在完整示例中看到的秒到分钟的差异.希望这些数据对遇到此问题的其他人有用.

Based on this test, access performance decreases with each type. Access times for structured array and recarray are 4x-13x slower than ndarray access (but all are only a fraction of second). However, ndarray access is 1000x faster than masked array access. That explains the seconds to minutes difference I see in my complete example. Hopefully this data is useful to others that encounter this issue.

这篇关于ndarray 比 recarray 访问快吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持跟版网!

The End

相关推荐

在xarray中按单个维度的多个坐标分组
groupby multiple coords along a single dimension in xarray(在xarray中按单个维度的多个坐标分组)...
2024-08-22 Python开发问题
15

Pandas中的GROUP BY AND SUM不丢失列
Group by and Sum in Pandas without losing columns(Pandas中的GROUP BY AND SUM不丢失列)...
2024-08-22 Python开发问题
17

GROUP BY+新列+基于条件的前一行抓取值
Group by + New Column + Grab value former row based on conditionals(GROUP BY+新列+基于条件的前一行抓取值)...
2024-08-22 Python开发问题
18

PANDA中的Groupby算法和插值算法
Groupby and interpolate in Pandas(PANDA中的Groupby算法和插值算法)...
2024-08-22 Python开发问题
11

PANAS-基于列对行进行分组,并将NaN替换为非空值
Pandas - Group Rows based on a column and replace NaN with non-null values(PANAS-基于列对行进行分组,并将NaN替换为非空值)...
2024-08-22 Python开发问题
10

按10分钟间隔对 pandas 数据帧进行分组
Grouping pandas DataFrame by 10 minute intervals(按10分钟间隔对 pandas 数据帧进行分组)...
2024-08-22 Python开发问题
11