沃梦达 / 编程问答 / php问题 / 正文

上传文件名中的 UTF-8 字符在文件上传时混乱

UTF-8 characters in uploaded file name are jumbled on file upload(上传文件名中的 UTF-8 字符在文件上传时混乱)

本文介绍了上传文件名中的 UTF-8 字符在文件上传时混乱的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 IIS7 上运行系统.页面 META 标记的编码为 UTF-8,根据 Chrome 菜单,实际编码看起来是相同的.

I'm running a system on IIS7. The page META tag has the encoding as UTF-8, and the real encoding would appear to be the same according to the Chrome menu.

当我上传文件名中带有长连字符"(–")的文件时,它会被转换为垃圾字符(–").

When I upload a file with a "long hyphen" in its name ("–") it gets converted to junk characters ("â€"").

垃圾字符保存在MySQL中,服务器上文件的文件名也有垃圾字符.但是,当我从数据库中提取文件名并使用 PHP 显示时,它会显示正确的连字符.

The junk characters are saved in MySQL and the file name of the file on the server also has the junk characters. However when I pull the file name from the database and display it with PHP, it displays with the correct hyphen.

有没有办法将文件名存储为 UTF-8?当我尝试这段代码时,我得到一个错误:

Is there any way to have the file name stored as UTF-8? When I try this code I get an error:

$fn = iconv("CP-1252", "UTF-8", $file['name']);
debug($fn);

Notice (8): iconv(): Wrong charset, conversion from `CP-1252' to `UTF-8' is not allowed

--

几个月后更新!所以这个问题与 Windows 上的一个 PHP 错误有关:http://bugs.php.net/bug.php?id=47096

Update several months later! So this problem is related to a PHP bug on Windows: http://bugs.php.net/bug.php?id=47096

Unicode 字符在 move_upload_file 上被 PHP 破坏 - 尽管我也看到了 rename 和 ZipArchive 的问题,所以我认为这是 PHP 和 Windows 的普遍问题.

Unicode characters get mangled by PHP on move_upload_file - although I have also seen the issue with rename and ZipArchive so I think it's a general issue with PHP and Windows.

我已经从 Wordpress 中找到了一个解决方法 在这里.我必须使用损坏的文件名存储文件,然后在下载/电子邮件/显示上对其进行清理.

I have adapted a workaround from Wordpress found here. I have to store the file with the mangled file name and then sanitize it on download/email/display.

以下是我正在使用的改编方法,以防将来对某人有用.如果您在下载/发送电子邮件之前尝试压缩文件,或者您需要将文件写入网络共享,这仍然没有多大用处.

Here are the adapted methods I'm using in case it's of use to someone in future. This still isn't much use if you're trying to zip files before downloading/emailing or you need to write the files to a network share.

public static function sanitizeFilename($filename, $utf8 = true)
{
if ( self::seems_utf8($filename) == $utf8 )
    return $filename;

// On Windows platforms, PHP will mangle non-ASCII characters, see http://bugs.php.net/bug.php?id=47096
if ( 'WIN' == substr( PHP_OS, 0, 3 ) ) {
        if(setlocale( LC_CTYPE, 0 )=='C'){ // Locale has not been set and the default is being used, according to answer by Colin Morelli at http://stackoverflow.com/questions/13788415/how-to-retrieve-the-current-windows-codepage-in-php
                // thus, we force the locale to be explicitly set to the default system locale
                $codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, '' ), '.' ), '.' );
        }
        else {
                $codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, 0 ), '.' ), '.' );
        }
        $charset = 'UTF-8';
        if ( function_exists( 'iconv' ) ) {

                if ( false == $utf8 ){
                    $filename = iconv( $charset, $codepage . '//IGNORE', $filename );
                }
                else {
                    $filename = iconv( $codepage, $charset, $filename );
                }
        } elseif ( function_exists( 'mb_convert_encoding' ) ) {
                if ( false == $utf8 )
                        $filename = mb_convert_encoding( $filename, $codepage, $charset );
                else
                        $filename = mb_convert_encoding( $filename, $charset, $codepage );
        }
}

return $filename;       

}

public static function seems_utf8($str) {
    $length = strlen($str);
    for ($i=0; $i < $length; $i++) {
            $c = ord($str[$i]);
            if ($c < 0x80) $n = 0; # 0bbbbbbb
            elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
            elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
            elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
            elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
            elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
            else return false; # Does not match any model
            for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
                    if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
                            return false;
            }
    }
    return true;

}

推荐答案

更新事实上,这是 Windows 上的一个 PHP 错误.有如下解决方法,但我见过的最佳解决方案是使用 WFIO 扩展.这个扩展为文件流提供了一个新的协议 wfio:// 并允许 PHP 在 Windows 文件系统上正确处理 UTF-8 字符.wfio:// 支持多种 PHP 函数,包括 fopen、scandir、mkdir、copy、rename 等.

UPDATE Indeed this is a PHP bug on Windows. There are workarounds like below, but the best solution I have seen is to use the WFIO extension. This extension provides a new protocol wfio:// for file streams and allows PHP to properly handle UTF-8 characters on the Windows file-system. wfio:// supports a number of PHP functions including fopen, scandir, mkdir, copy, rename, etc.

原始解决方案

所以这个问题与 Windows 上的一个 PHP 错误有关:http://bugs.php.net/bug.php?id=47096

So this problem is related to a PHP bug on Windows: http://bugs.php.net/bug.php?id=47096

Unicode 字符在 move_upload_file 上被 PHP 破坏 - 尽管我也看到了 rename 和 ZipArchive 的问题,所以我认为这是 PHP 和 Windows 的普遍问题.

Unicode characters get mangled by PHP on move_upload_file - although I have also seen the issue with rename and ZipArchive so I think it's a general issue with PHP and Windows.

我已经从 Wordpress 中找到了一个解决方法 在这里.我必须使用损坏的文件名存储文件,然后在下载/电子邮件/显示上对其进行清理.

I have adapted a workaround from Wordpress found here. I have to store the file with the mangled file name and then sanitize it on download/email/display.

以下是我正在使用的改编方法,以防将来对某人有用.如果您在下载/发送电子邮件之前尝试压缩文件,或者您需要将文件写入网络共享,这仍然没有多大用处.

Here are the adapted methods I'm using in case it's of use to someone in future. This still isn't much use if you're trying to zip files before downloading/emailing or you need to write the files to a network share.

public static function sanitizeFilename($filename, $utf8 = true)
{
if ( self::seems_utf8($filename) == $utf8 )
    return $filename;

// On Windows platforms, PHP will mangle non-ASCII characters, see http://bugs.php.net/bug.php?id=47096
if ( 'WIN' == substr( PHP_OS, 0, 3 ) ) {
        if(setlocale( LC_CTYPE, 0 )=='C'){ // Locale has not been set and the default is being used, according to answer by Colin Morelli at http://stackoverflow.com/questions/13788415/how-to-retrieve-the-current-windows-codepage-in-php
                // thus, we force the locale to be explicitly set to the default system locale
                $codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, '' ), '.' ), '.' );
        }
        else {
                $codepage = 'Windows-' . trim( strstr( setlocale( LC_CTYPE, 0 ), '.' ), '.' );
        }
        $charset = 'UTF-8';
        if ( function_exists( 'iconv' ) ) {

                if ( false == $utf8 ){
                    $filename = iconv( $charset, $codepage . '//IGNORE', $filename );
                }
                else {
                    $filename = iconv( $codepage, $charset, $filename );
                }
        } elseif ( function_exists( 'mb_convert_encoding' ) ) {
                if ( false == $utf8 )
                        $filename = mb_convert_encoding( $filename, $codepage, $charset );
                else
                        $filename = mb_convert_encoding( $filename, $charset, $codepage );
        }
}

return $filename;       

}

public static function seems_utf8($str) {
    $length = strlen($str);
    for ($i=0; $i < $length; $i++) {
            $c = ord($str[$i]);
            if ($c < 0x80) $n = 0; # 0bbbbbbb
            elseif (($c & 0xE0) == 0xC0) $n=1; # 110bbbbb
            elseif (($c & 0xF0) == 0xE0) $n=2; # 1110bbbb
            elseif (($c & 0xF8) == 0xF0) $n=3; # 11110bbb
            elseif (($c & 0xFC) == 0xF8) $n=4; # 111110bb
            elseif (($c & 0xFE) == 0xFC) $n=5; # 1111110b
            else return false; # Does not match any model
            for ($j=0; $j<$n; $j++) { # n bytes matching 10bbbbbb follow ?
                    if ((++$i == $length) || ((ord($str[$i]) & 0xC0) != 0x80))
                            return false;
            }
    }
    return true;

}

这篇关于上传文件名中的 UTF-8 字符在文件上传时混乱的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本文标题为:上传文件名中的 UTF-8 字符在文件上传时混乱

基础教程推荐