How to keep json_encode() from dropping strings with invalid characters(如何防止 json_encode() 删除包含无效字符的字符串)
问题描述
有没有办法防止 json_encode()
为包含无效(非 UTF-8)字符的字符串返回 null
?
Is there a way to keep json_encode()
from returning null
for a string that contains an invalid (non-UTF-8) character?
在复杂的系统中调试可能会很麻烦.实际看到无效字符或至少将其省略会更合适.就目前而言,json_encode()
将静默删除整个字符串.
It can be a pain in the ass to debug in a complex system. It would be much more fitting to actually see the invalid character, or at least have it omitted. As it stands, json_encode()
will silently drop the entire string.
示例(UTF-8):
$string =
array(utf8_decode("Düsseldorf"), // Deliberately produce broken string
"Washington",
"Nairobi");
print_r(json_encode($string));
结果
[null,"Washington","Nairobi"]
想要的结果:
["D�sseldorf","Washington","Nairobi"]
注意:我不希望让损坏的字符串在 json_encode() 中起作用.我正在寻找更容易诊断编码错误的方法.null
字符串对此没有帮助.
Note: I am not looking to make broken strings work in json_encode(). I am looking for ways to make it easier to diagnose encoding errors. A null
string isn't helpful for that.
推荐答案
php 确实会尝试抛出错误,但仅当您关闭 display_errors 时.这很奇怪,因为 display_errors
设置仅用于控制是否将错误打印到标准输出,而不是是否触发错误.我想强调的是,当您打开 display_errors
时,即使您可能会看到各种其他 php 错误,php 不仅会隐藏此错误,它甚至不会触发它.这意味着它不会出现在任何错误日志中,也不会调用任何自定义的 error_handlers.错误永远不会发生.
php does try to spew an error, but only if you turn display_errors off. This is odd because the display_errors
setting is only meant to control whether or not errors are printed to standard output, not whether or not an error is triggered. I want to emphasize that when you have display_errors
on, even though you may see all kinds of other php errors, php doesn't just hide this error, it will not even trigger it. That means it will not show up in any error logs, nor will any custom error_handlers get called. The error just never occurs.
这里有一些代码可以证明这一点:
Here's some code that demonstrates this:
error_reporting(-1);//report all errors
$invalid_utf8_char = chr(193);
ini_set('display_errors', 1);//display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());//nothing
ini_set('display_errors', 0);//do not display errors to standard output
var_dump(json_encode($invalid_utf8_char));
var_dump(error_get_last());// json_encode(): Invalid UTF-8 sequence in argument
这种奇怪而不幸的行为与此错误有关 https://bugs.php.net/bug.php?id=47494 和其他一些,而且看起来永远不会被修复.
That bizarre and unfortunate behavior is related to this bug https://bugs.php.net/bug.php?id=47494 and a few others, and doesn't look like it will ever be fixed.
解决方法:
在将字符串传递给 json_encode 之前清理字符串可能是一个可行的解决方案.
Cleaning the string before passing it to json_encode may be a workable solution.
$stripped_of_invalid_utf8_chars_string = iconv('UTF-8', 'UTF-8//IGNORE', $orig_string);
if ($stripped_of_invalid_utf8_chars_string !== $orig_string) {
// one or more chars were invalid, and so they were stripped out.
// if you need to know where in the string the first stripped character was,
// then see http://stackoverflow.com/questions/7475437/find-first-character-that-is-different-between-two-strings
}
$json = json_encode($stripped_of_invalid_utf8_chars_string);
http://php.net/manual/en/function.iconv.php
说明书上说
//IGNORE
静默丢弃目标中的非法字符字符集.
//IGNORE
silently discards characters that are illegal in the target charset.
所以首先删除有问题的字符,理论上 json_encode() 不应该得到任何它会窒息和失败的东西.我还没有验证带有 //IGNORE
标志的 iconv 的输出与有效 utf8 字符是什么的 json_encodes 概念完全兼容,所以买家要当心......因为可能存在边缘情况仍然失败.呃,我讨厌字符集问题.
So by first removing the problematic characters, in theory json_encode() shouldnt get anything it will choke on and fail with. I haven't verified that the output of iconv with the //IGNORE
flag is perfectly compatible with json_encodes notion of what valid utf8 characters are, so buyer beware...as there may be edge cases where it still fails. ugh, I hate character set issues.
编辑
在 php 7.2+ 中,json_encode
似乎有一些新标志:JSON_INVALID_UTF8_IGNORE
和 JSON_INVALID_UTF8_SUBSTITUTE
目前还没有太多文档,但就目前而言,此测试应该可以帮助您了解预期行为:https://github.com/php/php-src/blob/master/ext/json/tests/json_encode_invalid_utf8.phpt
Edit
in php 7.2+, there seems to be some new flags for json_encode
:
JSON_INVALID_UTF8_IGNORE
and JSON_INVALID_UTF8_SUBSTITUTE
There's not much documentation yet, but for now, this test should help you understand expected behavior:
https://github.com/php/php-src/blob/master/ext/json/tests/json_encode_invalid_utf8.phpt
而且,在 php 7.3+ 中有新标志 JSON_THROW_ON_ERROR
.参见 http://php.net/manual/en/class.jsonexception.php
And, in php 7.3+ there's the new flag JSON_THROW_ON_ERROR
. See http://php.net/manual/en/class.jsonexception.php
这篇关于如何防止 json_encode() 删除包含无效字符的字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:如何防止 json_encode() 删除包含无效字符的字符串
基础教程推荐
- phpmyadmin 错误“#1062 - 密钥 1 的重复条目‘1’" 2022-01-01
- PHP 守护进程/worker 环境 2022-01-01
- 如何在 XAMPP 上启用 mysqli? 2021-01-01
- HTTP 与 FTP 上传 2021-01-01
- 在 yii2 中迁移时出现异常“找不到驱动程序" 2022-01-01
- 在 CakePHP 2.0 中使用 Html Helper 时未定义的变量 2021-01-01
- 如何在 Symfony 和 Doctrine 中实现多对多和一对多? 2022-01-01
- 找不到类“AppHttpControllersDB",我也无法使用新模型 2022-01-01
- Doctrine 2 - 在多对多关系中记录更改 2022-01-01
- 使用 PDO 转义列名 2021-01-01