How to get code point number for a given character in a utf-8 string?(如何获取 utf-8 字符串中给定字符的代码点编号?)
问题描述
我想获取给定 UTF-8 字符串的 UCS-2 代码点.例如,单词hello"应该变成0068 0065 006C 006C 006F".请注意,字符可以来自任何语言,包括复杂的脚本,如东亚语言.
I want to get the UCS-2 code points for a given UTF-8 string. For example the word "hello" should become something like "0068 0065 006C 006C 006F". Please note that the characters could be from any language including complex scripts like the east asian languages.
因此,问题归结为将给定字符转换为其 UCS-2 代码点"
So, the problem comes down to "convert a given character to its UCS-2 code point"
但是怎么样?拜托,任何形式的帮助都将非常感谢,因为我很着急.
But how? Please, any kind of help will be very very much appreciated since I am in a great hurry.
作为回答发布的提问者回复的转录
感谢您的回复,但需要在 PHP v 4 或 5 而不是 6 中完成.
Thanks for your reply, but it needs to be done in PHP v 4 or 5 but not 6.
该字符串将是来自表单字段的用户输入.
The string will be a user input, from a form field.
我想实现 utf8to16 或 utf8decode 之类的 PHP 版本
I want to implement a PHP version of utf8to16 or utf8decode like
function get_ucs2_codepoint($char)
{
// calculation of ucs2 codepoint value and assign it to $hex_codepoint
return $hex_codepoint;
}
你能帮我用 PHP 还是用上面提到的版本的 PHP 来完成?
Can you help me with PHP or can it be done with PHP with version mentioned above?
推荐答案
Scott Reynen 编写了一个函数来将 UTF-8 转换为 Unicode.我发现它在查看 PHP 文档.
Scott Reynen wrote a function to convert UTF-8 into Unicode. I found it looking at the PHP documentation.
function utf8_to_unicode( $str ) {
$unicode = array();
$values = array();
$lookingFor = 1;
for ($i = 0; $i < strlen( $str ); $i++ ) {
$thisValue = ord( $str[ $i ] );
if ( $thisValue < ord('A') ) {
// exclude 0-9
if ($thisValue >= ord('0') && $thisValue <= ord('9')) {
// number
$unicode[] = chr($thisValue);
}
else {
$unicode[] = '%'.dechex($thisValue);
}
} else {
if ( $thisValue < 128)
$unicode[] = $str[ $i ];
else {
if ( count( $values ) == 0 ) $lookingFor = ( $thisValue < 224 ) ? 2 : 3;
$values[] = $thisValue;
if ( count( $values ) == $lookingFor ) {
$number = ( $lookingFor == 3 ) ?
( ( $values[0] % 16 ) * 4096 ) + ( ( $values[1] % 64 ) * 64 ) + ( $values[2] % 64 ):
( ( $values[0] % 32 ) * 64 ) + ( $values[1] % 64 );
$number = dechex($number);
$unicode[] = (strlen($number)==3)?"%u0".$number:"%u".$number;
$values = array();
$lookingFor = 1;
} // if
} // if
}
} // for
return implode("",$unicode);
} // utf8_to_unicode
这篇关于如何获取 utf-8 字符串中给定字符的代码点编号?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何获取 utf-8 字符串中给定字符的代码点编号?
基础教程推荐
- Doctrine 2 - 在多对多关系中记录更改 2022-01-01
- 使用 PDO 转义列名 2021-01-01
- 在 yii2 中迁移时出现异常“找不到驱动程序" 2022-01-01
- 找不到类“AppHttpControllersDB",我也无法使用新模型 2022-01-01
- PHP 守护进程/worker 环境 2022-01-01
- 如何在 XAMPP 上启用 mysqli? 2021-01-01
- phpmyadmin 错误“#1062 - 密钥 1 的重复条目‘1’" 2022-01-01
- 在 CakePHP 2.0 中使用 Html Helper 时未定义的变量 2021-01-01
- HTTP 与 FTP 上传 2021-01-01
- 如何在 Symfony 和 Doctrine 中实现多对多和一对多? 2022-01-01