Parsing large JSON file in .NET(在 .NET 中解析大型 JSON 文件)
问题描述
到目前为止,我已经使用了 Json.NET 的JsonConvert.Deserialize(json)"方法,效果很好,老实说,我不需要更多的东西.
I have used the "JsonConvert.Deserialize(json)" method of Json.NET so far which worked quite well and to be honest, I didn't need anything more than this.
我正在开发一个后台(控制台)应用程序,该应用程序不断从不同的 URL 下载 JSON 内容,然后将结果反序列化为 .NET 对象列表.
I am working on a background (console) application which constantly downloads the JSON content from different URLs, then deserializes the result into a list of .NET objects.
using (WebClient client = new WebClient())
{
string json = client.DownloadString(stringUrl);
var result = JsonConvert.DeserializeObject<List<Contact>>(json);
}
上面的简单代码片段可能看起来并不完美,但它确实可以完成工作.当文件很大(15,000 个联系人 - 48 MB 文件)时,JsonConvert.DeserializeObject 不是解决方案,并且该行会引发 JsonReaderException 异常类型.
The simple code snippet above doesn't probably seem perfect, but it does the job. When the file is large (15,000 contacts - 48 MB file), JsonConvert.DeserializeObject isn't the solution and the line throws an exception type of JsonReaderException.
下载的 JSON 内容是一个数组,这就是示例的样子.Contact 是反序列化 JSON 对象的容器类.
The downloaded JSON content is an array and this is how a sample looks like. Contact is a container class for the deserialized JSON object.
[
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
}
]
我最初的猜测是内存不足.只是出于好奇,我尝试将其解析为 JArray,这也导致了同样的异常.
My initial guess is it runs out of memory. Just out of curiosity, I tried to parse it as JArray which caused the same exception too.
我已经开始深入研究 Json.NET 文档并阅读类似的主题.由于我还没有设法产生一个可行的解决方案,我决定在这里发布一个问题.
I have started to dive into Json.NET documentation and read similar threads. As I haven't managed to produce a working solution yet, I decided to post a question here.
更新:在逐行反序列化时,我得到了同样的错误:[.Path '', line 600003, position 1."所以下载了其中两个并在记事本++中检查了它们.我注意到如果数组长度超过 12,000,则在第 12000 个元素之后,["关闭,另一个数组开始.换句话说,JSON 看起来就像这样:
UPDATE: While deserializing line by line, I got the same error: " [. Path '', line 600003, position 1." So downloaded two of them and checked them in Notepad++. I noticed that if the array length is more than 12,000, after 12000th element, the "[" is closed and another array starts. In other words, the JSON looks exactly like this:
[
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
}
]
[
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
},
{
"firstname": "sometext",
"lastname": "sometext"
}
]
推荐答案
正如您在更新中正确诊断的那样,问题是 JSON 有一个结束 ]
紧跟一个开始 [
开始下一组.这种格式在整体上会使 JSON 无效,这就是 Json.NET 抛出错误的原因.
As you've correctly diagnosed in your update, the issue is that the JSON has a closing ]
followed immediately by an opening [
to start the next set. This format makes the JSON invalid when taken as a whole, and that is why Json.NET throws an error.
幸运的是,这个问题似乎经常出现,以至于 Json.NET 实际上有一个特殊的设置来处理它.如果直接使用 JsonTextReader
读取 JSON,可以将 SupportMultipleContent
标志设置为 true
,然后使用循环反序列化每个项个人.
Fortunately, this problem seems to come up often enough that Json.NET actually has a special setting to deal with it. If you use a JsonTextReader
directly to read the JSON, you can set the SupportMultipleContent
flag to true
, and then use a loop to deserialize each item individually.
这应该允许您以高效的内存方式成功处理非标准 JSON,而不管有多少数组或每个数组中有多少项.
This should allow you to process the non-standard JSON successfully and in a memory efficient manner, regardless of how many arrays there are or how many items in each array.
using (WebClient client = new WebClient())
using (Stream stream = client.OpenRead(stringUrl))
using (StreamReader streamReader = new StreamReader(stream))
using (JsonTextReader reader = new JsonTextReader(streamReader))
{
reader.SupportMultipleContent = true;
var serializer = new JsonSerializer();
while (reader.Read())
{
if (reader.TokenType == JsonToken.StartObject)
{
Contact c = serializer.Deserialize<Contact>(reader);
Console.WriteLine(c.FirstName + " " + c.LastName);
}
}
}
完整演示:https://dotnetfiddle.net/2TQa8p
这篇关于在 .NET 中解析大型 JSON 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!
本文标题为:在 .NET 中解析大型 JSON 文件
基础教程推荐
- 为什么Flurl.Http DownloadFileAsync/Http客户端GetAsync需要 2022-09-30
- rabbitmq 的 REST API 2022-01-01
- 如何在 IDE 中获取 Xamarin Studio C# 输出? 2022-01-01
- c# Math.Sqrt 实现 2022-01-01
- SSE 浮点算术是否可重现? 2022-01-01
- MS Visual Studio .NET 的替代品 2022-01-01
- 如何激活MC67中的红灯 2022-01-01
- 有没有办法忽略 2GB 文件上传的 maxRequestLength 限制? 2022-01-01
- 将 XML 转换为通用列表 2022-01-01
- 将 Office 安装到 Windows 容器 (servercore:ltsc2019) 失败,错误代码为 17002 2022-01-01