Why can a T* be passed in register, but a unique_ptrlt;Tgt; cannot?(为什么可以在寄存器中传递 T*,而 unique_ptrlt;Tgt;不能?)
问题描述
我正在 CppCon 2019 上观看 Chandler Carruth 的演讲:
没有零成本抽象
在其中,他举例说明了使用 std::unique_ptr<int>
而不是 int*
会产生多少开销让他感到惊讶;该片段大约在时间点 17:25 开始.
您可以查看他的示例对片段 (godbolt.org) - 确实,编译器似乎不愿意传递 unique_ptr 值 - 实际上在底线中只是一个地址 - 在寄存器内,仅在直接内存中.
Carruth 先生在 27:00 左右提出的观点之一是 C++ ABI 需要传递按值参数(一些但不是全部;也许 - 非原始类型?非平凡构造类型?)在内存中而不是在寄存器中.
我的问题:
- 这实际上是某些平台上的 ABI 要求吗?(哪个?)或者这只是在某些情况下的一些悲观情绪?
- 为什么 ABI 是这样的?也就是说,如果结构/类的字段适合寄存器,甚至单个寄存器 - 为什么我们不能在该寄存器中传递它?
- C++ 标准委员会近年来或曾经讨论过这一点吗?
<小时>
PS - 为了不留下没有代码的问题:
普通指针:
void bar(int* ptr) noexcept;void baz(int* ptr) noexcept;void foo(int* ptr) noexcept {如果 (*ptr > 42) {酒吧(ptr);*ptr = 42;}巴兹(ptr);}
唯一指针:
使用 std::unique_ptr;void bar(int* ptr) noexcept;void baz(unique_ptr ptr) noexcept;void foo(unique_ptr ptr) noexcept {如果 (*ptr > 42) {酒吧(ptr.get());*ptr = 42;}baz(std::move(ptr));}
- 这实际上是 ABI 要求,还是在某些情况下只是一种悲观?
一个例子是 System V 应用二进制接口 AMD64 架构处理器补充.此 ABI 适用于 64 位 x86 兼容 CPU(Linux x86_64 架构).在 Solaris、Linux、FreeBSD、macOS、Windows Subsystem for Linux 上紧随其后:
<块引用>如果一个 C++ 对象有一个非平凡的复制构造函数或一个非平凡的析构函数,它通过不可见引用传递(对象在参数列表由一个具有 INTEGER 类的指针).
具有非平凡复制构造函数或非平凡析构函数的对象不能通过值传递,因为此类对象必须具有明确定义的地址.类似问题适用从函数返回对象时.
请注意,只有 2 个通用寄存器可用于传递具有普通复制构造函数和普通析构函数的 1 个对象,即只能传入
sizeof
不大于 16 的对象的值注册.有关调用约定的详细处理,请参阅 Agner Fog 的调用约定,尤其是第 7.1 节传递和返回对象.在寄存器中传递 SIMD 类型有单独的调用约定.其他 CPU 架构有不同的 ABI.
<小时>还有 Itanium C++ ABI 大多数编译器都遵守(除了来自 MSVC),需要:><块引用>
如果参数类型对于调用而言是非平凡的,则调用者必须为临时文件分配空间并通过引用传递该临时文件.
在以下情况下,一个类型被认为是非平凡的:
- 它有一个非平凡的复制构造函数、移动构造函数或析构函数,或者
- 它的所有复制和移动构造函数都被删除.
这个定义适用于类类型,旨在作为 [class.temporary]p3 中的定义的补充,其中在传递或返回类型时允许额外的临时类型.对于 ABI 而言微不足道的类型将根据基本 C ABI 的规则进行传递和返回,例如在登记册中;这通常具有执行类型的简单副本的效果.
<小时><块引用>
- 为什么 ABI 是这样的?也就是说,如果结构/类的字段适合寄存器,甚至单个寄存器 - 为什么我们不能在该寄存器中传递它?
这是一个实现细节,但是当处理异常时,在堆栈展开期间,自动存储持续时间被销毁的对象必须相对于函数堆栈帧是可寻址的,因为此时寄存器已被破坏.堆栈展开代码需要对象的地址来调用它们的析构函数,但寄存器中的对象没有地址.
迂腐,析构函数对对象进行操作:
<块引用>一个对象在其构建期间 ([class.cdtor])、整个生命周期和销毁期间都占用一个存储区域.
如果没有为对象分配可寻址存储空间,则 C++ 中不能存在对象,因为 对象的身份就是它的地址.
当需要一个带有保存在寄存器中的简单复制构造函数的对象的地址时,编译器可以将该对象存储到内存中并获取该地址.另一方面,如果复制构造函数是非平凡的,则编译器不能仅仅将其存储到内存中,而是需要调用接受引用的复制构造函数,因此需要寄存器中对象的地址.调用约定可能不依赖于复制构造函数是否内联在被调用者中.
另一种思考方式是,对于可简单复制的类型,编译器将对象的值转移到寄存器中,如有必要,可以通过普通内存存储从中恢复对象.例如:
void f(long*);void g(long a) { f(&a);}
在带有 System V ABI 的 x86_64 上编译为:
g(long)://参数 a 在 rdi 中.push rax//对齐堆栈,更快的 sub rsp, 8.mov qword ptr [rsp], rdi//将rdi中a的值存入栈,创建一个对象.mov rdi, rsp//将栈上对象的地址加载到rdi中.call f(long*)//使用 rdi 中的地址调用 f.pop rax//更快地添加 rsp, 8.ret//堆栈对象的析构函数是微不足道的,没有代码可以发出.
<小时>
Chandler Carruth 在他发人深省的演讲中提到可能需要对 ABI 进行重大更改(除其他外)实施可以改善情况的破坏性举措.IMO,如果使用新 ABI 的功能明确选择加入新的不同链接,则 ABI 更改可能是不间断的,例如在 extern "C++20" {}
块中声明它们(可能在用于迁移现有 API 的新内联命名空间中).这样只有针对具有新链接的新函数声明编译的代码才能使用新 ABI.
请注意,当被调用的函数已内联时,ABI 不适用.与链接时代码生成一样,编译器可以内联其他翻译单元中定义的函数或使用自定义调用约定.
I'm watching Chandler Carruth's talk in CppCon 2019:
There are no Zero-Cost Abstractions
in it, he gives the example of how he was surprised by just how much overhead you incur by using an std::unique_ptr<int>
over an int*
; that segment starts about at time point 17:25.
You can have a look at the compilation results of his example pair-of-snippets (godbolt.org) - to witness that, indeed, it seems the compiler is not willing to pass the unique_ptr value - which in fact in the bottom line is just an address - inside a register, only in straight memory.
One of the points Mr. Carruth makes at around 27:00 is that the C++ ABI requires by-value parameters (some but not all; perhaps - non-primitive types? non-trivially-constructible types?) to be passed in-memory rather than within a register.
My questions:
- Is this actually an ABI requirement on some platforms? (which?) Or maybe it's just some pessimization in certain scenarios?
- Why is the ABI like that? That is, if the fields of a struct/class fit within registers, or even a single register - why should we not be able to pass it within that register?
- Has the C++ standards committee discussed this point in recent years, or ever?
PS - So as not to leave this question with no code:
Plain pointer:
void bar(int* ptr) noexcept;
void baz(int* ptr) noexcept;
void foo(int* ptr) noexcept {
if (*ptr > 42) {
bar(ptr);
*ptr = 42;
}
baz(ptr);
}
Unique pointer:
using std::unique_ptr;
void bar(int* ptr) noexcept;
void baz(unique_ptr<int> ptr) noexcept;
void foo(unique_ptr<int> ptr) noexcept {
if (*ptr > 42) {
bar(ptr.get());
*ptr = 42;
}
baz(std::move(ptr));
}
- Is this actually an ABI requirement, or maybe it's just some pessimization in certain scenarios?
One example is System V Application Binary Interface AMD64 Architecture Processor Supplement. This ABI is for 64-bit x86-compatible CPUs (Linux x86_64 architecure). It is followed on Solaris, Linux, FreeBSD, macOS, Windows Subsystem for Linux:
If a C++ object has either a non-trivial copy constructor or a non-trivial destructor, it is passed by invisible reference (the object is replaced in the parameter list by a pointer that has class INTEGER).
An object with either a non-trivial copy constructor or a non-trivial destructor cannot be passed by value because such objects must have well defined addresses. Similar issues apply when returning an object from a function.
Note, that only 2 general purpose registers can be used for passing 1 object with a trivial copy constructor and a trivial destructor, i.e. only values of objects with sizeof
no greater than 16 can be passed in registers. See Calling conventions by Agner Fog for a detailed treatment of the calling conventions, in particular §7.1 Passing and returning objects. There are separate calling conventions for passing SIMD types in registers.
There are different ABIs for other CPU architectures.
There is also Itanium C++ ABI which most compilers comply with (apart from MSVC), which requires:
If the parameter type is non-trivial for the purposes of calls, the caller must allocate space for a temporary and pass that temporary by reference.
A type is considered non-trivial for the purposes of calls if:
- it has a non-trivial copy constructor, move constructor, or destructor, or
- all of its copy and move constructors are deleted.
This definition, as applied to class types, is intended to be the complement of the definition in [class.temporary]p3 of types for which an extra temporary is allowed when passing or returning a type. A type which is trivial for the purposes of the ABI will be passed and returned according to the rules of the base C ABI, e.g. in registers; often this has the effect of performing a trivial copy of the type.
- Why is the ABI like that? That is, if the fields of a struct/class fit within registers, or even a single register - why should we not be able to pass it within that register?
It is an implementation detail, but when an exception is handled, during stack unwinding, the objects with automatic storage duration being destroyed must be addressable relative to the function stack frame because the registers have been clobbered by that time. Stack unwinding code needs objects' addresses to invoke their destructors but objects in registers do not have an address.
Pedantically, destructors operate on objects:
An object occupies a region of storage in its period of construction ([class.cdtor]), throughout its lifetime, and in its period of destruction.
and an object cannot exist in C++ if no addressable storage is allocated for it because object's identity is its address.
When an address of an object with a trivial copy constructor kept in registers is needed the compiler can just store the object into memory and obtain the address. If the copy constructor is non-trivial, on the other hand, the compiler cannot just store it into memory, it rather needs to call the copy constructor which takes a reference and hence requires the address of the object in the registers. The calling convention probably cannot depend whether the copy constructor was inlined in the callee or not.
Another way to think about this, is that for trivially copyable types the compiler transfers the value of an object in registers, from which an object can be recovered by plain memory stores if necessary. E.g.:
void f(long*);
void g(long a) { f(&a); }
on x86_64 with System V ABI compiles into:
g(long): // Argument a is in rdi.
push rax // Align stack, faster sub rsp, 8.
mov qword ptr [rsp], rdi // Store the value of a in rdi into the stack to create an object.
mov rdi, rsp // Load the address of the object on the stack into rdi.
call f(long*) // Call f with the address in rdi.
pop rax // Faster add rsp, 8.
ret // The destructor of the stack object is trivial, no code to emit.
In his thought-provoking talk Chandler Carruth mentions that a breaking ABI change may be necessary (among other things) to implement the destructive move that could improve things. IMO, the ABI change could be non-breaking if the functions using the new ABI explicitly opt-in to have a new different linkage, e.g. declare them in extern "C++20" {}
block (possibly, in a new inline namespace for migrating existing APIs). So that only the code compiled against the new function declarations with the new linkage can use the new ABI.
Note that ABI doesn't apply when the called function has been inlined. As well as with link-time code generation the compiler can inline functions defined in other translation units or use custom calling conventions.
这篇关于为什么可以在寄存器中传递 T*,而 unique_ptr<T>不能?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:为什么可以在寄存器中传递 T*,而 unique_ptr&
基础教程推荐
- 如何“在 Finder 中显示"或“在资源管理器中显 2021-01-01
- 为 C/C++ 中的项目的 makefile 生成依赖项 2022-01-01
- 使用从字符串中提取的参数调用函数 2022-01-01
- 如何在不破坏 vtbl 的情况下做相当于 memset(this, ...) 的操作? 2022-01-01
- 从 std::cin 读取密码 2021-01-01
- Windows Media Foundation 录制音频 2021-01-01
- 如何使图像调整大小以在 Qt 中缩放? 2021-01-01
- 管理共享内存应该分配多少内存?(助推) 2022-12-07
- 在 C++ 中循环遍历所有 Lua 全局变量 2021-01-01
- 为什么语句不能出现在命名空间范围内? 2021-01-01