Using Unicode in C++ source code(在 C++ 源代码中使用 Unicode)
问题描述
C++源代码的标准编码是什么?C++ 标准甚至对此有什么说明吗?我可以用 Unicode 编写 C++ 源代码吗?
What is the standard encoding of C++ source code? Does the C++ standard even say something about this? Can I write C++ source in Unicode?
例如,我可以在评论中使用非ASCII字符,例如汉字吗?如果是这样,是否允许使用完整的 Unicode 或只是 Unicode 的一个子集?(例如,那个 16 位的第一页或其他任何名称.)
For example, can I use non-ASCII characters such as Chinese characters in comments? If so, is full Unicode allowed or just a subset of Unicode? (e.g., that 16-bit first page or whatever it's called.)
此外,我可以将 Unicode 用于字符串吗?例如:
Furthermore, can I use Unicode for strings? For example:
Wstring str=L"Strange chars: â Țđ ě €€";
推荐答案
C++ 中的编码相当复杂.这是我的理解.
Encoding in C++ is quite a bit complicated. Here is my understanding of it.
每个实现都必须支持来自基本源字符集的字符.其中包括 §2.2/1(C++11 中的 §2.3/1)中列出的常见字符.这些字符应该都适合一个 char
.此外,实现必须支持使用一种称为 universal-character-names
的方式命名其他字符的方法,看起来像 uffff
或 Uffffffff
并且可以用来指代 Unicode 字符.它们的一个子集可用于标识符(在附件 E 中列出).
Every implementation has to support characters from the basic source character set. These include common characters listed in §2.2/1 (§2.3/1 in C++11). These characters should all fit into one char
. In addition implementations have to support a way to name other characters using a way called universal-character-names
and look like uffff
or Uffffffff
and can be used to refer to Unicode characters. A subset of them are usable in identifiers (listed in Annex E).
这一切都很好,但是从文件中的字符到源字符(在编译时使用)的映射是实现定义的.这构成了所使用的编码.这是它的字面意思(C++98 版本):
This is all nice, but the mapping from characters in the file, to source characters (used at compile time) is implementation defined. This constitutes the encoding used. Here is what it says literally (C++98 version):
物理源文件字符是映射,在一个实现定义的方式,到基本的源字符set(引入换行符对于行尾指示符)如果必要的.三字符序列 (2.3)替换为相应的单字符内部陈述.任何源文件字符不在基本来源中字符集 (2.2) 被替换为描述的通用字符名称标记那个字符.(一个实现可以使用任何内部编码,只要实际遇到的扩展字符源文件,和相同的扩展源文件中表示的字符作为通用字符名称(即使用 uXXXX 符号),是同等处理.)
Physical source file characters are mapped, in an implementation-defined manner, to the basic source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences (2.3) are replaced by corresponding single-character internal representations. Any source file character not in the basic source character set (2.2) is replaced by the universal-character-name that des- ignates that character. (An implementation may use any internal encoding, so long as an actual extended character encountered in the source file, and the same extended character expressed in the source file as a universal-character-name (i.e. using the uXXXX notation), are handled equivalently.)
对于 gcc,您可以使用选项 -finput-charset=charset
更改它.此外,您可以更改用于在运行时重新设置值的执行字符.正确的选项是 -fexec-charset=charset
for char(默认为 utf-8
)和 -fwide-exec-charset=charset
code>(根据 wchar_t
的大小,默认为 utf-16
或 utf-32
).
For gcc, you can change it using the option -finput-charset=charset
. Additionally, you can change the execution character used to represet values at runtime. The proper option for this is -fexec-charset=charset
for char (it defaults to utf-8
) and -fwide-exec-charset=charset
(which defaults to either utf-16
or utf-32
depending on the size of wchar_t
).
这篇关于在 C++ 源代码中使用 Unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:在 C++ 源代码中使用 Unicode
基础教程推荐
- 如何“在 Finder 中显示"或“在资源管理器中显 2021-01-01
- Windows Media Foundation 录制音频 2021-01-01
- 如何使图像调整大小以在 Qt 中缩放? 2021-01-01
- 如何在不破坏 vtbl 的情况下做相当于 memset(this, ...) 的操作? 2022-01-01
- 为什么语句不能出现在命名空间范围内? 2021-01-01
- 使用从字符串中提取的参数调用函数 2022-01-01
- 管理共享内存应该分配多少内存?(助推) 2022-12-07
- 在 C++ 中循环遍历所有 Lua 全局变量 2021-01-01
- 从 std::cin 读取密码 2021-01-01
- 为 C/C++ 中的项目的 makefile 生成依赖项 2022-01-01