How to convert UTF-8 std::string to UTF-16 std::wstring?(如何将 UTF-8 std::string 转换为 UTF-16 std::wstring?)
问题描述
如果我有一个 UTF-8 std::string
,我如何将它转换为 UTF-16 std::wstring
?其实我想比较两个波斯语词.
If I have a UTF-8 std::string
how do I convert it to a UTF-16 std::wstring
? Actually, I want to compare two Persian words.
推荐答案
这是一些代码.仅经过轻微测试,可能会有一些改进.调用此函数可将 UTF-8 字符串转换为 UTF-16 wstring.如果它认为输入的字符串不是 UTF-8 则抛出异常,否则返回等效的 UTF-16 wstring.
Here's some code. Only lightly tested and there's probably a few improvements. Call this function to convert a UTF-8 string to a UTF-16 wstring. If it thinks the input string is not UTF-8 then it will throw an exception, otherwise it returns the equivalent UTF-16 wstring.
std::wstring utf8_to_utf16(const std::string& utf8)
{
std::vector<unsigned long> unicode;
size_t i = 0;
while (i < utf8.size())
{
unsigned long uni;
size_t todo;
bool error = false;
unsigned char ch = utf8[i++];
if (ch <= 0x7F)
{
uni = ch;
todo = 0;
}
else if (ch <= 0xBF)
{
throw std::logic_error("not a UTF-8 string");
}
else if (ch <= 0xDF)
{
uni = ch&0x1F;
todo = 1;
}
else if (ch <= 0xEF)
{
uni = ch&0x0F;
todo = 2;
}
else if (ch <= 0xF7)
{
uni = ch&0x07;
todo = 3;
}
else
{
throw std::logic_error("not a UTF-8 string");
}
for (size_t j = 0; j < todo; ++j)
{
if (i == utf8.size())
throw std::logic_error("not a UTF-8 string");
unsigned char ch = utf8[i++];
if (ch < 0x80 || ch > 0xBF)
throw std::logic_error("not a UTF-8 string");
uni <<= 6;
uni += ch & 0x3F;
}
if (uni >= 0xD800 && uni <= 0xDFFF)
throw std::logic_error("not a UTF-8 string");
if (uni > 0x10FFFF)
throw std::logic_error("not a UTF-8 string");
unicode.push_back(uni);
}
std::wstring utf16;
for (size_t i = 0; i < unicode.size(); ++i)
{
unsigned long uni = unicode[i];
if (uni <= 0xFFFF)
{
utf16 += (wchar_t)uni;
}
else
{
uni -= 0x10000;
utf16 += (wchar_t)((uni >> 10) + 0xD800);
utf16 += (wchar_t)((uni & 0x3FF) + 0xDC00);
}
}
return utf16;
}
这篇关于如何将 UTF-8 std::string 转换为 UTF-16 std::wstring?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何将 UTF-8 std::string 转换为 UTF-16 std::wstring?
基础教程推荐
- 在 C++ 中循环遍历所有 Lua 全局变量 2021-01-01
- 使用从字符串中提取的参数调用函数 2022-01-01
- 如何使图像调整大小以在 Qt 中缩放? 2021-01-01
- 从 std::cin 读取密码 2021-01-01
- 如何在不破坏 vtbl 的情况下做相当于 memset(this, ...) 的操作? 2022-01-01
- 为 C/C++ 中的项目的 makefile 生成依赖项 2022-01-01
- 管理共享内存应该分配多少内存?(助推) 2022-12-07
- 如何“在 Finder 中显示"或“在资源管理器中显 2021-01-01
- Windows Media Foundation 录制音频 2021-01-01
- 为什么语句不能出现在命名空间范围内? 2021-01-01