C++ amp; Boost: encode/decode UTF-8(C++ amp;提升:编码/解码 UTF-8)
问题描述
我正在尝试做一个非常简单的任务:获取 unicode-aware wstring 并将其转换为
string
,编码为 UTF8 字节,然后相反解决方法:取一个包含 UTF8 字节的 string
并将其转换为 unicode-aware wstring
.
问题是,我需要它跨平台,我需要它与 Boost 一起工作......我似乎无法找到让它工作的方法.我一直在玩
- http://www.edobashira.com/2010/03/using-boost-code-facet-for-reading-utf8.html 和
- http://www.boost.org/doc/libs/1_46_0/libs/serialization/doc/codecvt.html
尝试将代码转换为使用 stringstream
/wstringstream
而不是任何文件,但似乎没有任何效果.
例如,在 Python 中它看起来像这样:
<预><代码>>>>u"שלום"你'u05e9u05dcu05d5u05dd'>>>u"שלום".encode("utf8")'xd7xa9xd7x9cxd7x95xd7x9d'>>>'xd7xa9xd7x9cxd7x95xd7x9d'.decode("utf8")你'u05e9u05dcu05d5u05dd'我最终想要的是:
wchar_t uchars[] = {0x5e9, 0x5dc, 0x5d5, 0x5dd, 0};wstring ws(uchars);字符串 s = encode_utf8(ws);//s 现在保存 "xd7xa9xd7x9cxd7x95xd7x9d"wstring ws2 = decode_utf8(s);//ws2 现在持有 {0x5e9, 0x5dc, 0x5d5, 0x5dd}
我真的不想再增加对 ICU 的依赖或本着这种精神的东西......但据我所知,Boost 应该是可能的.
一些示例代码将不胜感激!谢谢
谢谢大家,但最终我求助于 http://utfcpp.sourceforge.net/ -- 它是一个非常轻量级且易于使用的仅标头库.我在这里分享一个演示代码,如果有人觉得它有用:
inline void decode_utf8(const std::string& bytes, std::wstring& wstr){utf8::utf8to32(bytes.begin(), bytes.end(), std::back_inserter(wstr));}内联 void encode_utf8(const std::wstring& wstr, std::string& 字节){utf8::utf32to8(wstr.begin(), wstr.end(), std::back_inserter(bytes));}
用法:
wstring ws(L"u05e9u05dcu05d5u05dd");字符串 s;encode_utf8(ws, s);
I'm trying to do a very simple task: take a unicode-aware wstring
and convert it to a string
, encoded as UTF8 bytes, and then the opposite way around: take a string
containing UTF8 bytes and convert it to unicode-aware wstring
.
The problem is, I need it cross-platform and I need it work with Boost... and I just can't seem to figure a way to make it work. I've been toying with
- http://www.edobashira.com/2010/03/using-boost-code-facet-for-reading-utf8.html and
- http://www.boost.org/doc/libs/1_46_0/libs/serialization/doc/codecvt.html
Trying to convert the code to use stringstream
/wstringstream
instead of files of whatever, but nothing seems to work.
For instance, in Python it would look like so:
>>> u"שלום"
u'u05e9u05dcu05d5u05dd'
>>> u"שלום".encode("utf8")
'xd7xa9xd7x9cxd7x95xd7x9d'
>>> 'xd7xa9xd7x9cxd7x95xd7x9d'.decode("utf8")
u'u05e9u05dcu05d5u05dd'
What I'm ultimately after is this:
wchar_t uchars[] = {0x5e9, 0x5dc, 0x5d5, 0x5dd, 0};
wstring ws(uchars);
string s = encode_utf8(ws);
// s now holds "xd7xa9xd7x9cxd7x95xd7x9d"
wstring ws2 = decode_utf8(s);
// ws2 now holds {0x5e9, 0x5dc, 0x5d5, 0x5dd}
I really don't want to add another dependency on the ICU or something in that spirit... but to my understanding, it should be possible with Boost.
Some sample code would greatly be appreciated! Thanks
Thanks everyone, but ultimately I resorted to http://utfcpp.sourceforge.net/ -- it's a header-only library that's very lightweight and easy to use. I'm sharing a demo code here, should anyone find it useful:
inline void decode_utf8(const std::string& bytes, std::wstring& wstr)
{
utf8::utf8to32(bytes.begin(), bytes.end(), std::back_inserter(wstr));
}
inline void encode_utf8(const std::wstring& wstr, std::string& bytes)
{
utf8::utf32to8(wstr.begin(), wstr.end(), std::back_inserter(bytes));
}
Usage:
wstring ws(L"u05e9u05dcu05d5u05dd");
string s;
encode_utf8(ws, s);
这篇关于C++ &提升:编码/解码 UTF-8的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:C++ &提升:编码/解码 UTF-8
基础教程推荐
- 如何“在 Finder 中显示"或“在资源管理器中显 2021-01-01
- 从 std::cin 读取密码 2021-01-01
- 管理共享内存应该分配多少内存?(助推) 2022-12-07
- 为 C/C++ 中的项目的 makefile 生成依赖项 2022-01-01
- 在 C++ 中循环遍历所有 Lua 全局变量 2021-01-01
- 使用从字符串中提取的参数调用函数 2022-01-01
- Windows Media Foundation 录制音频 2021-01-01
- 如何使图像调整大小以在 Qt 中缩放? 2021-01-01
- 为什么语句不能出现在命名空间范围内? 2021-01-01
- 如何在不破坏 vtbl 的情况下做相当于 memset(this, ...) 的操作? 2022-01-01