Convert from UTF-8 to unicode c++(从 UTF-8 转换为 unicode C++)
问题描述
如何在 c++ 应用程序中转换 ú,其中应用程序接收字符为 UTF-8 编码 %C3%BA 并将其存储为 unicode 等效 %FA.我只想知道我将如何编写代码来执行此编码过程
How do I convert ú within a c++ application where the application receives the character as UTF-8 encoding %C3%BA and store it as the unicode equivalent %FA. I just want to know how I would go about writing code to perform this encoding process
推荐答案
我昨天刚刚写了一些代码来做到这一点...
I just wrote some code to do this yesterday...
我并不是说这是做到这一点的完美"方式,但它似乎适用于我运行过的所有测试用例(我为此编写了两个方向).
I'm not saying this is the "perfect" way to do this, but it appears to work for all testcases I've run through it (I wrote both directions for that purpose).
我会让你把 "%NN" 转换成一个整数值.
I'll leave it to you to translate "%NN" to an integer value.
#include <iostream>
#include <deque>
std::deque<int> unicode_to_utf8(int charcode)
{
std::deque<int> d;
if (charcode < 128)
{
d.push_back(charcode);
}
else
{
int first_bits = 6;
const int other_bits = 6;
int first_val = 0xC0;
int t = 0;
while (charcode >= (1 << first_bits))
{
{
t = 128 | (charcode & ((1 << other_bits)-1));
charcode >>= other_bits;
first_val |= 1 << (first_bits);
first_bits--;
}
d.push_front(t);
}
t = first_val | charcode;
d.push_front(t);
}
return d;
}
int utf8_to_unicode(std::deque<int> &coded)
{
int charcode = 0;
int t = coded.front();
coded.pop_front();
if (t < 128)
{
return t;
}
int high_bit_mask = (1 << 6) -1;
int high_bit_shift = 0;
int total_bits = 0;
const int other_bits = 6;
while((t & 0xC0) == 0xC0)
{
t <<= 1;
t &= 0xff;
total_bits += 6;
high_bit_mask >>= 1;
high_bit_shift++;
charcode <<= other_bits;
charcode |= coded.front() & ((1 << other_bits)-1);
coded.pop_front();
}
charcode |= ((t >> high_bit_shift) & high_bit_mask) << total_bits;
return charcode;
}
int main()
{
int charcode;
for(;;)
{
std::cout << "Enter unicode value:" << std::endl;
std::cin >> charcode;
auto x = unicode_to_utf8(charcode);
for(auto c : x)
{
std::cout << "\x" << std::hex << c << " ";
}
std::cout << std::endl;
int c = utf8_to_unicode(x);
std::cout << "reversed:" << std::dec << c << std::hex << " in hex:" << c << std::endl;
}
}
这篇关于从 UTF-8 转换为 unicode C++的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:从 UTF-8 转换为 unicode C++
基础教程推荐
- 您如何将 CreateThread 用于属于类成员的函数? 2021-01-01
- 什么是T&&(双与号)在 C++11 中是什么意思? 2022-11-04
- 运算符重载的基本规则和习语是什么? 2022-10-31
- C++ 程序在执行 std::string 分配时总是崩溃 2022-01-01
- C++,'if' 表达式中的变量声明 2021-01-01
- 如何定义双括号/双迭代器运算符,类似于向量的向量? 2022-01-01
- 如何在 C++ 中处理或避免堆栈溢出 2022-01-01
- C++ 标准:取消引用 NULL 指针以获取引用? 2021-01-01
- 调用std::Package_TASK::Get_Future()时可能出现争用情况 2022-12-17
- 设计字符串本地化的最佳方法 2022-01-01