How to print UTF-8 strings to std::cout on Windows?(如何在 Windows 上将 UTF-8 字符串打印到 std::cout?)
问题描述
我正在用 C++ 编写一个跨平台的应用程序.所有字符串在内部都是 UTF-8 编码的.考虑以下简化代码:
I'm writing a cross-platform application in C++. All strings are UTF-8-encoded internally. Consider the following simplified code:
#include <string>
#include <iostream>
int main() {
std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
std::cout << test;
return 0;
}
在 Unix 系统上,std::cout
期望 8 位字符串是 UTF-8 编码的,所以这段代码工作正常.
On Unix systems, std::cout
expects 8-bit strings to be UTF-8-encoded, so this code works fine.
然而,在 Windows 上,std::cout
期望 8 位字符串为 Latin-1 或类似的非 Unicode 格式(取决于代码页).这导致以下输出:
On Windows, however, std::cout
expects 8-bit strings to be in Latin-1 or a similar non-Unicode format (depending on the codepage). This leads to the following output:
希腊语:╬▒╬▓╬│╬┤;德语:├£bergr├Â├ƒentr├ñger
Greek: ╬▒╬▓╬│╬┤; German: ├£bergr├Â├ƒentr├ñger
如何让 std::cout
在 Windows 上将 8 位字符串解释为 UTF-8?
What can I do to make std::cout
interpret 8-bit strings as UTF-8 on Windows?
这是我试过的:
#include <string>
#include <iostream>
#include <io.h>
#include <fcntl.h>
int main() {
_setmode(_fileno(stdout), _O_U8TEXT);
std::string test = u8"Greek: αβγδ; German: Übergrößenträger";
std::cout << test;
return 0;
}
我希望 _setmode
能够解决问题.但是,这会在调用 operator<<
的行中导致以下断言错误:
I was hoping that _setmode
would do the trick. However, this results in the following assertion error in the line that calls operator<<
:
Microsoft Visual C++ 运行时库
Microsoft Visual C++ Runtime Library
调试断言失败!
程序:d:visual studio 2015Projectsutf8testDebugutf8test.exe文件:minkernelcrtsucrtsrcappcrtstdiofputc.cpp行:47
Program: d:visual studio 2015Projectsutf8testDebugutf8test.exe File: minkernelcrtsucrtsrcappcrtstdiofputc.cpp Line: 47
表达式:( (_Stream.is_string_backed()) || (fn = _fileno(_Stream.public_stream()), ((_textmode_safe(fn) == __crt_lowio_text_mode::ansi) && !_tm_unicode_safe(fn))))
Expression: ( (_Stream.is_string_backed()) || (fn = _fileno(_Stream.public_stream()), ((_textmode_safe(fn) == __crt_lowio_text_mode::ansi) && !_tm_unicode_safe(fn))))
有关您的程序如何导致断言的信息失败,请参阅有关断言的 Visual C++ 文档.
For information on how your program can cause an assertion failure, see the Visual C++ documentation on asserts.
推荐答案
问题不是 std::cout
而是 windows 控制台.使用 C-stdio,您将在设置 UTF-8 代码页(使用 SetConsoleOutputCP
或 chcp
) 和 在 cmd 的设置中设置支持 Unicode 的字体(Consolas 应该 支持超过 2000 个字符,并且有注册表黑客可以向 cmd 添加更多功能强大的字体.
The problem is not std::cout
but the windows console. Using C-stdio you will get the ü
with fputs( "xc3xbc", stdout );
after setting the UTF-8 codepage (either using SetConsoleOutputCP
or chcp
) and setting a Unicode supporting font in cmd's settings (Consolas should support over 2000 characters and there are registry hacks to add more capable fonts to cmd).
如果你用 putc('xc3');putc('xbc');
你会得到双豆腐,因为控制台将它们分别解释为非法字符.这可能就是 C++ 流所做的.
If you output one byte after the other with putc('xc3'); putc('xbc');
you will get the double tofu as the console gets them interpreted separately as illegal characters. This is probably what the C++ streams do.
请参阅 Windows 控制台上的 UTF-8 输出以进行详细讨论.
See UTF-8 output on Windows console for a lenghty discussion.
对于我自己的项目,我最终实现了一个 std::stringbuf
来转换到 Windows-1252.我确实需要完整的 Unicode 输出,但是这对您没有帮助.
For my own project, I finally implemented a std::stringbuf
doing the conversion to Windows-1252. I you really need full Unicode output, this will not really help you, however.
另一种方法是覆盖cout
的streambuf,使用fputs
作为实际输出:
An alternative approach would be overwriting cout
's streambuf, using fputs
for the actual output:
#include <iostream>
#include <sstream>
#include <Windows.h>
class MBuf: public std::stringbuf {
public:
int sync() {
fputs( str().c_str(), stdout );
str( "" );
return 0;
}
};
int main() {
SetConsoleOutputCP( CP_UTF8 );
setvbuf( stdout, nullptr, _IONBF, 0 );
MBuf buf;
std::cout.rdbuf( &buf );
std::cout << u8"Greek: αβγδ
" << std::flush;
}
我在这里关闭了输出缓冲,以防止它干扰未完成的 UTF-8 字节序列.
I turned off output buffering here to prevent it to interfere with unfinished UTF-8 byte sequences.
这篇关于如何在 Windows 上将 UTF-8 字符串打印到 std::cout?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何在 Windows 上将 UTF-8 字符串打印到 std::cout?
基础教程推荐
- 如何在不破坏 vtbl 的情况下做相当于 memset(this, ...) 的操作? 2022-01-01
- 在 C++ 中循环遍历所有 Lua 全局变量 2021-01-01
- 使用从字符串中提取的参数调用函数 2022-01-01
- 为 C/C++ 中的项目的 makefile 生成依赖项 2022-01-01
- 从 std::cin 读取密码 2021-01-01
- 如何“在 Finder 中显示"或“在资源管理器中显 2021-01-01
- Windows Media Foundation 录制音频 2021-01-01
- 管理共享内存应该分配多少内存?(助推) 2022-12-07
- 如何使图像调整大小以在 Qt 中缩放? 2021-01-01
- 为什么语句不能出现在命名空间范围内? 2021-01-01