Emulate quot;doublequot; using 2 quot;floatquot;s(模仿“双使用 2 个“浮点数)
问题描述
我正在为仅支持 32 位单精度浮点运算的嵌入式硬件编写程序.然而,我正在实现的算法需要 64 位双精度加法和比较.我正在尝试使用两个 float
的元组来模拟 double
数据类型.因此,double d
将被模拟为包含元组的 struct
:(float d. float d.low)
.
I am writing a program for an embedded hardware that only supports 32-bit single-precision floating-point arithmetic. The algorithm I am implementing, however, requires a 64-bit double-precision addition and comparison. I am trying to emulate double
datatype using a tuple of two float
s. So a double d
will be emulated as a struct
containing the tuple: (float d. float d.low)
.
使用字典顺序进行比较应该简单明了.然而,添加有点棘手,因为我不确定我应该使用哪个基础.应该是 FLT_MAX
吗?以及如何检测进位?
The comparison should be straightforward using a lexicographic ordering. The addition however is a bit tricky because I am not sure which base should I use. Should it be FLT_MAX
? And how can I detect a carry?
如何做到这一点?
编辑(清晰):我需要额外的有效数字而不是额外的范围.
Edit (Clarity): I need the extra significant digits rather than the extra range.
推荐答案
double-float 是一种技术,它使用成对的单精度数来实现单精度算术的几乎两倍的精度,同时单精度略有降低指数范围(由于范围远端的中间下溢和溢出).基本算法由 T.J. 开发.1970 年代的德克尔和威廉·卡汉.下面我列出了两篇相当近期的论文,展示了这些技术如何适用于 GPU,但是这些论文中涵盖的大部分材料都适用于独立于平台的内容,因此应该对手头的任务有用.
double-float is a technique that uses pairs of single-precision numbers to achieve almost twice the precision of single precision arithmetic accompanied by a slight reduction of the single precision exponent range (due to intermediate underflow and overflow at the far ends of the range). The basic algorithms were developed by T.J. Dekker and William Kahan in the 1970s. Below I list two fairly recent papers that show how these techniques can be adapted to GPUs, however much of the material covered in these papers is applicable independent of platform so should be useful for the task at hand.
https://hal.archives-ouvertes.fr/hal-00021443纪尧姆·达格拉萨,大卫·德福尔在图形硬件上实现 float-float 运算符,第 7 届实数与计算机会议,RNC7.
https://hal.archives-ouvertes.fr/hal-00021443 Guillaume Da Graça, David Defour Implementation of float-float operators on graphics hardware, 7th conference on Real Numbers and Computers, RNC7.
http://andrewthall.org/papers/df64_qf128.pdf安德鲁·索尔用于 GPU 计算的扩展精度浮点数.
http://andrewthall.org/papers/df64_qf128.pdf Andrew Thall Extended-Precision Floating-Point Numbers for GPU Computation.
这篇关于模仿“双"使用 2 个“浮点数"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:模仿“双"使用 2 个“浮点数"
基础教程推荐
- 设计字符串本地化的最佳方法 2022-01-01
- C++ 程序在执行 std::string 分配时总是崩溃 2022-01-01
- C++ 标准:取消引用 NULL 指针以获取引用? 2021-01-01
- 调用std::Package_TASK::Get_Future()时可能出现争用情况 2022-12-17
- 运算符重载的基本规则和习语是什么? 2022-10-31
- 什么是T&&(双与号)在 C++11 中是什么意思? 2022-11-04
- 您如何将 CreateThread 用于属于类成员的函数? 2021-01-01
- 如何在 C++ 中处理或避免堆栈溢出 2022-01-01
- C++,'if' 表达式中的变量声明 2021-01-01
- 如何定义双括号/双迭代器运算符,类似于向量的向量? 2022-01-01